This article describes how to create a new keyspace on a Cassandra database server, how to add data to that keyspace and how to run some simple queries against that data.
If you do not know how to install and run Cassandra, please read my earlier article on that subject first.
You should also know a little about XML: specifically how to find and fix unmatched XML tags in case you typo during the following instructions. (Hint: file-by-file backups are a good idea until you get comfortable.)
Create a New Keyspace on Cassandra
First, sign on to your Cassandra server using the “cassandra-cli” client. Use the “show keyspaces” command to ensure you have a live connection to the server and to make sure the keyspace you are about to add doesn’t already exist.
cassandra> show keyspaces
Keyspace1
system
These two keyspaces are automatically installed when you installed Cassandra and are completely independent of one another – like separate databases on a relational database system would be. A diagram of two separate keyspaces in our Cassandra database would look like this:

We want to add a new keyspace called “ToyStore”. Once we’re done, we’d expect our diagram to look like this:

Now, we’re working with the current stable version (“0.6.3″ at the time I wrote this article) , so we’ll need to add our new keyspace by hand. Close down your server and the client and then go into your Cassandra “conf” directory. Open the “storage-conf.xml” file with your favorite text editor.
Around line 60 you’ll see a tag that looks like this:
<Keyspaces>
<Keyspace Name="Keyspace1">
When you scroll further you’ll see come to the end of the “Keyspaces” section. Note that there is no ‘<Keyspace Name=”system”>’ entry; all entries here are for “user” databases.
After the XML tags for “keyspace1″, please add the following XML snippet to create an new, empty keyspace called “ToyStore”. This snippet also creates a “column family” (if you’re a relational database user think “table” for now) called “Toys”.
<Keyspace Name="ToyStore">
<ColumnFamily Name="Toys" CompareWith="UTF8Type" />
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
<ReplicationFactor>1</ReplicationFactor
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>
The “CompareWith” attribute in the “ColumnFamily” tag you just added controls how row keys are indexed and sorted. The “UTF8Type” value key indicates that you’re indexing by UTF8 characters. Other possible values include”AsciiType”, “LongType” (64-bit long integers) and “BytesType” (straight bit-to-bit comparison – the default value).
The other tags here (“ReplicaPlacementStrategy”, “ReplicateFactor” and “EndPointSnitch”) set a few replication options we’ll get into later.
Save your changes and restart your Cassandra server. If you can connect with the Cassandra client, issue the following command and see “ToyStore” in the result you’ve succeeded in adding your first keyspace to Cassandra!
cassandra> show keyspaces
Keyspace1
system
ToyStore
Add Data To An Existing Keyspace on Cassandra
Now that we have a new “ToyStore” keyspace it’s time to add some data. If you were watching closely you’ll notice that we did more than add a keystore in the previous step: we added our first “column family” too. (Think “table” if you’re coming from a relational database background.)

To get started adding data, restart your Cassandra client and use the following syntax to add six name/value pairs to the “Toys” column family of your new “ToyStore” keyspace.
cassandra> set ToyStore.Toys['Transformer']['Price'] = ’29.99′
Value inserted.
cassandra> set ToyStore.Toys['GumDrop']['Price'] = ’0.25′
Value inserted.
cassandra> set ToyStore.Toys['MatchboxCar']['Price'] = ’1.49′
Value inserted.
cassandra> set ToyStore.Toys['Transformer']['Section'] = ‘Action Figures’
Value inserted.
cassandra> set ToyStore.Toys['GumDrop']['Section'] = ‘Candy’
Value inserted.
cassandra> set ToyStore.Toys['MatchboxCar']['Section'] = ‘Vehicles’
Value inserted.
If you run a “help” command from the Cassandra client you will see the following syntax for the kind of “set” command we just used:
set <ksp>.<cf>['<key>']['<col>'] = '<value>'
Let’s break this command syntax down using one of the commands we just typed.
set ToyStore.Toys['Transformer']['Price'] = '29.99'
According to our command syntax, the command we typed meant this:
- ksp = KeySpace = “ToyStore”
- cf = Column Family = “Toys”
- key = Row Key (an indexed key which links multiple columns) = “Transformer”
- col = Single Column Name (the name in a single name/value pair) = “Price”
- val = Single Column Value (the value in a single name/value pair) = “29.99″

These six commands created a total of three rows in the “Toys” column family: “Transformer”, “GumDrop” and “MatchboxCar”. Within each row you created two columns: “Section” and “Price”. Sketched out in a diagram the data you inserted would look something like this:

Within the “Toys” column family, you could also represent this data in JSON like this:
{
"Transformer" : {
"Price" : "29.99",
"Section" : "Action Figures"
}
"GumDrop" : {
"Price" : "0.25",
"Section" : "Candy"
}
"MatchboxCar" : {
"Price" : "1.49",
"Section" : "Vehicles"
}
}
Starting to make sense? Now, let’s try to pull this data back.
Retrieve Data From An Existing Keyspace on Cassandra
Let’s start by counting the number of name/value pairs (i.e., “columns”) stored under one of the row keys we just inserted.
cassandra> count ToyStore.Toys['GumDrop']
2 columns
If you followed directions, the answer will be “2 columns”, whether you use “GumDrop”, “Transformer” or “MatchboxCar” as your column key.
Now try spelling out the row key in all lowercase.
cassandra> count ToyStore.Toys['gumdrop']
0 columns
Yes, Cassandra row keys are case-sensitive. Consider yourself warned, especially if you’re coming from a database environment where cases are insensitive.
Now trying spelling out the row key that doesn’t exist.
cassandra> count ToyStore.Toys['RedMatterBall']
0 columns
Notice that you didn’t get a “no column exists” error on your count statement; instead you were simply told that zero name/value pairs exist for your non-existent row key.
Now that you know to be careful with the exact name and case of your row keys, let’s pull back the data in a particular row instead of just counting how many columns it contains. To do this, use the “get” command as shown below.
cassandra> get ToyStore.Toys['GumDrop']
=> (column=Section, value=Candy, timestamp=1278132493790000)
=> (column=Price, value=0.25, timestamp=1278132306875000)
Returned 2 results.
The two “column” and “value” entries look familiar but there’s a third item in each of our columns: “timestamp”. That value represents the time when you made each column entry. Timestamp may not mean much to us yet (we will safely ignore it for another article or two), but timestamp will mean a great deal to us when we start merging column inserts/updates from two or more Cassandra database nodes.
By the way, here’s how you could represent the timestamp on each column in your diagram:

But back to our data retrieval task. Before we move on, try at least one row key that doesn’t exist.
cassandra> get ToyStore.Toys['RedMatterBall']
Returned 0 results.
Again, note that Cassandra reports that there are “0 results” for this row key, not that this row key doesn’t exist.
The last thing we’re going to do in this article is drill down into an existing row and only pick out one column (i.e., one name/value pair).
cassandra> get ToyStore.Toys['GumDrop']['Price']
=> (column=Price, value=0.25, timestamp=1278132306875000)
Now try this with a valid row key and an invalid column.
cassandra> get ToyStore.Toys['GumDrop']['Taste']
Exception null
This time we got an error rather than a “count of zero” message!
Relational database folks, are you starting to see the pattern? (Hint: Using non-existent row keys is like executing a “SELECT COUNT(*) FROM DB” with a WHERE clause that matches nothing, but using non-existent column names is like executing a query with invalid fields.)
For more information about columns, row keys and another Cassandra structure called a “super column” (or “subcolumn”), please see my recent article on that subject.
Troubleshooting
If you encounter a “Fatal error: Invalid replicaplacementstrategy class org.apache.cassandra.locator.RackUnawareStrategy” while trying to start up your Cassandra server, this may be caused by spaces or newlines in your “ReplicaPlacementStrategy” keyspace definition. (Remember this definition is found in your “storage-conf.xml” file.) Make sure this line looks like: “<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>” – with NO spaces or other spacing characters anywhere within this tag!
If you encounter a “supercolumn parameter is not optional for super CF” when you try to issue an otherwise valid “set” command from the Cassandra client, you probably incorrectly defined the column referenced in your set command as a “super” column in your keyspace definition. To fix, take out the ‘ColumnType=”Super”‘ and ‘CompareSubcolumnsWith=”UTF8Type”‘ attributes from that column’s definition in your “storage-conf.xml” file, then restart the Cassandra server.