The presentation with slides is available here (silverlight required)
Some interesting bits to me:
- He talked about several patterns for distributed systems that they have found useful.
- Google currently MapReduces through about an exabyte of data per month
- Interesting example of how they use MapReduce - to return the relevant map tiles in Google Maps for a given query
- He pointed out that they have Service Clusters of BigTable so that each group doesn't have to maintain their own - this relates to what Stu and I are doing at Rackspace - creating multi-tenant Hadoop and Cassandra clusters for similar reasons
- They use ranged distribution of keys for BigTable, saying that consistent hashing is good in some ways, but they wanted to be able to have locality of key sequences.
- He talked about something I've been looking at recently - how to do custom multi-datacenter replication by table (or for Cassandra by keyspace).