“Cassandra” is the name of a highly distributable “nosql” database developed in Java by Facebook. This database was released to the public through the Apache project in July 2008 as “Apache Cassandra” and is one of the more popular on-premises (or cloud) distributed databases today.
Like many other “nosql” databases, Cassandra offers simple name-value pair storage, although there are both row-like and column-like concepts in play.
Like many other distributed databases, Cassandra makes use of the concept of “eventually consistency”. The general concept is that, in a quiet state, all nodes will “eventually” get all updates from all other nodes and will the entire dataset will be “consistent” across all nodes.
However, it is the behavior of this “eventually consistent” database when things are NOT quiet that give it its scalable power: applications built on top of individual nodes of this database must continue to function and must respond to later information gracefully enough to prevent interruptions of end user service (which would otherwise be caused waiting for a single master table to receive all updates).
Cassandra uses timestamps to reconcile distributed commits – another concept common in distributed databases but one that obviously depends on good timekeeping on far-flung nodes.
Cassandra is frequently wrapped by a data type encapsulation layer or an object serialization layer (such as Thrift) to provide applications with a richer data storage experience than simple name-value pairs.
For more information about WHY and WHEN to use Cassandra or another nosql database, please see Andy White’s “Why Cassandra” article.
To learn how to install and use Cassandra on your own system, please see my “Installing and Running the Cassandra Database” article.