Friday, March 26, 2010

NOSQL

As I've started getting up to speed at my new job at Rackspace down here in Texas, I've come into a new world called NoSQL. NoSQL is a term that Eric Evans re-coined relatively recently and he's since clarified that to mean Not only SQL. It's a term that kind of describes a set of distributed databases that have some similar properties.

Some of the suspects include Google's BigTable, Hadoop's HBase, Amazon's Dynamo, Apache's Cassandra, CouchDB, MongoDB, Voldemort, and others.

It seems to be based on the notion that if you have really, really, really large data sets, you run into some boundaries with the limits that a relational database imposes with ACID properties, transactions, and the unattainable triforce of Consistency, Availability, and Partition-tolerance (from the CAP Theorem). Jonathan Ellis blogged about deciding whether you should consider a NoSQL solution here.

So I've started drinking from a firehose of sources to try to understand more about them. We've been looking heavily into pieces of the Hadoop project for its distributed filesystem and Map/Reduce implementation (not exactly NoSQL but siblings to HBase), as well as the Cassandra project because of how it brings together useful features of BigTable and Dynamo and allows for completely horizontal scaling - no single point of failure.

More about the subject:
http://www.royans.net/arch - a blog about scalable web architectures, often talking about big data and NoSQL
http://nosql.mypopescu.com - a blog called myNoSQL that deals with all things NoSQL

No comments: