RSS
 

Archive for September, 2010

Java Web Servers and Networking Libraries

25 Sep

Recently I’ve been using Mina Core from Apache to do some async/scalable multi-tier development. I originally got hooked on Mina a few years ago when I needed to use Apache Mina FtpServer for some prototypes. I found the FtpServer to be of good quality so I kept Mina in the back of my mind.

It turns out that I later wrote my own HTTP and FTP servers (unfortunately not open sourced, I don’t own the code). Now that I’m back to writing some prototypes that require a powerful socket library I have returned to using Mina (2.0). It is funny how much the networking design resembles my own designs for the HTTP server, for example in this tutorial and related examples – the use of filters, buffers and decoding state are all quite familiar looking. Guess I shouldn’t have reinvented the wheel.

But beyond that what strikes me is the number of socket libraries out there for Java. For example, I have read some positive reactions to Grizzly and to xSocket. I’m sure there are others and I wish I had more time to research them all, for I do wonder what could be so different between them all.

To make matters interesting, there are also the various web servers derived from these libraries such as xLightweb, Grizzly comes with a web server, Mina has a sub project called AsyncWeb, and then there is Simple that apparently made its own network library. Not to mention the proven standbys such as Jetty, Tomcat and Caucho Resin most of which have their own libraries. There are a lot of other Java HTTP servers too: http://java-source.net/open-source/web-servers.

Fortunately my day to day work brings me in contact with a lot of different technologies, but still its hard to keep on top of them all and to discover all the good ideas out there. Currently Simple has my eye as far as HTTP servers go and Mina as far as network libraries go. Mina is a familiar style and even if it is not the *fastest* it certainly is in the right ball park.

 
Comments Off

Posted in Jetty

 

Stored Procedures in Cassandra

18 Sep

It is a much discussed topic whether or not business logic should be present within a database. Academically the preference seems to be to separate the layers. However, one of the best performing applications I ever worked on placed the lion’s share of the business logic within the database – that was with another nosql database, MUMPS.

Out in the wild the reality is that there are many variations – sometimes business logic is distributed across many servers, sometimes it is all on one server. And sometimes stored procedures are used but contain a minimum of business logic and instead enhance the efficiency of data storage or retrieval.

It is my opinion that Cassandra should support an easy to use model of stored procedures and let the user decide the level of appropriateness for their software needs. One of the concerns often associated with stored procedures is the lack of portability amongst database vendors and the problem of database vendor lock-in. With Cassandra you have that concern with the data model anyway, so an investment in Cassandra suggests you may wish try to get the most out of what the software has to offer.

Another challenge of stored procedures is the often complex and unfamiliar language syntax/api. I find TSQL fairly reasonable to work with, for example, but it can be a jolt to work with it if you spend much time coding in Java or C#. The stored procedure solution we are developing for Cassandra will leverage the familiarity of Javascript.

With the emergence of Web 2.0 and HTML 5, Javascript is playing an ever more important role on the client. Along with this is a revived interest in Javascript on the server. With our enhancements to Cassandra we’ll offer a complete end-to-end model of developing applications using exclusively Javascript. Indeed, by blending the differences between web services and stored procedures our enhancements will provide a efficient and concise programming model that supports very high performance code.

And lets not forget about distributed computing, it is our primary focus after all. An obvious, but still beautiful, outcome of using Cassandra to *store* the stored procedures and web services (and web sites) is that you do not have to replicate your code across all the nodes – Cassandra already does that. Updating your web service or stored procedure is as simple as a single update or insert call to Cassandra.

What’s more is that we will provide the ability to version your stored procedures and web services. With this will come the ability to select which version is active. This feature enables a complete roll-out (replication) of a new version while the old version is still running, then flip a switch and all the nodes will start using the new code. Flip a switch again if you need to revert to the old version.

The addition of stored procedures and web services will open a lot of new possibilities with Cassandra and we are excited to be innovating in this area. A strong database is the key to a strong distributed system.

 
Comments Off

Posted in Cassandra, Elastic Architecture, MUMPS, nosql

 

Migrate a Relational Database into Cassandra (Part IV – Northwind Import)

11 Sep

This article shows how to prepare and import a dataset expressed in Cassandra-friendly JSON into a Cassandra datastore using Cassandra’s “json2sstable” utility.

Before proceeding, you should understand my previous “Part 3″ article on “Northwood Conversion” – this article imports the JSON dataset created in that article.

You should also have downloaded and installed either the .NET version or Mono version of the DivConq JSON command-line utilities, and should also have a complete JSON document from a conversion of the Northwind database export. (You can also start with the “JSONout7.txt” document from this archive.)

Cleaning the Data

So far everything we’ve done has simply moved data around.  This has led to a JSON structure that contains everything and then some from the original relational database.  We could import that, but from here on out we’ll treat this data more like a traditional data warehouse by only working with a subset of the original data.

To do the stripping, we can use a DivConq utility called “StripNodesFromJSON”.   The following batch snippet cuts out extra nodes (like “Shippers”) and tags (like “Phone”).

rem Let us turn this structure into something we can use in a data warehouse
rem Strip a lot of the extra tags out
rem x Get rid of the Shippers node
StripNodesFromJSON JSONout7.txt JSONout8.txt Shippers
rem x Get rid of extra nodes from the Employee node
StripNodesFromJSON JSONout8.txt JSONout9.txt PostalCode Photo Address ReportsTo HireDate HomePhone Notes BirthDate Extension
rem x Get rid of extra nodes from the Customer node
StripNodesFromJSON JSONout9.txt JSONout10.txt City Phone Region ContactTitle Address PostalCode Fax ContactName
rem x Get rid of extra nodes from the ItemEntry nodes
StripNodesFromJSON JSONout10.txt JSONout11.txt OrderID Product_UnitPrice Product_UnitsInStock Product_QuantityPerUnit Product_ReorderLevel Supplier_City Supplier_Region "Order Details_AutoID" Product_CategoryID Supplier_ContactTitle Supplier_ContactName Product_Discontinued Supplier_HomePage Supplier_PostalCode Supplier_Address Category_CategoryID  Category_Picture Category_Description Supplier_Fax Supplier_Phone
rem x Get rid of extra nodes from OrderInformation
StripNodesFromJSON JSONout11.txt JSONout12.txt OrderID, ShipPostalCode ShipCountry CustomerID EmployeeID

If you do a directory listing on the intermediate files created in this batch file you should see that each one is smaller than the one before it.

Cassandra’s JSON2SStable Format

If you’ve worked with Cassandra’s SStable2JSON utility, you’ve seen that the format Cassandra uses for its JSON datasets is not human-readable.

Cassandra’s SStable2JSON utility will export plain (no supercolumn) Column Families like this:

{
  "HotWheelsCar": [
    ["5072696365", "312e3439", 1278132336497000, false],
    ["53656374696f6e", "56656869636c6573", 1278132515996000, false]
  ],
  "GumDrop": [
    ["5072696365", "302e3235", 1278132306875000, false],
    ["53656374696f6e", "43616e6479", 1278132493790000, false]
  ]
}

…and will export supercolumn-filled Column Families like this:

{
  "ultralights": {
    "756c3233": {
      "deletedAt": -9223372036854775808,
      "subColumns": [
        ["7365617432", "392070656f706c65", 1283394499763000, false]
      ]
    }
  },
  "planes": {
    "706c616e65313436": {
      "deletedAt": -9223372036854775808,
      "subColumns": [
        ["726f773138", "372070656f706c65", 1283394371843000, false],
        ["726f773237", "322070656f706c65", 1283394387348000, false]
      ]
    },
    "706c616e65353436": {
      "deletedAt": -9223372036854775808,
      "subColumns": [
        ["726f773232", "332070656f706c65", 1283394349929000, false]
      ]
    }
  },
}

Several things are different than our JSON sets to date:

  • The data and supercolumn names are in hex rather than strings.  For example, instead of “Price”, you see “5072696365″ in the JSON above. (Use this to try it yourself.)
  • There are extra strings, such as “deletedAt” and “false”.  Fortunately, it appears that these can be faked up.
  • Columns are filed under “subColumns” node within each supercolumn entry.
  • JSON array structures are used in place of hierarchies.

…but it’s not impossible, it’s just different.

Creating JSON2SStable Format Files

Before we continue, we need to wrap our JSON datasets in one more node to represent the datastore – so far our top level has been column families, and the only remaining column family is now “Orders”.

Fortunately we can do this without a special utility: just a few lines of a batch file are needed to add a top-level “Northwind” node.

rem Add an extra wrapper for the name of the datastore
echo { "Northwind" : > JSONout12a.txt
type JSONout12.txt >> JSONout12a.txt
echo } >> JSONout12a.txt

Now we’re finally ready to use a DivConq utility to convert our human-readable JSON into the format needed by Cassandra’s JSON2SStable utility.  This part is easy.

rem Now convert wrapped dataset to json2sstable-ready
rem Cassandra array import format
PrepJSONForSSTableImport JSONout12a.txt JSONout13.txt

Now you should have a new, larger file filled with all the information Cassandra will need for its native import utility.

The Whole Export, Convert and Prep for Import Process

You may have noticed that the DivConq utilities ship with an “exportandimport.bat” file that performs all the steps covered so far.  Running this batch file should generate output like this.

C:\divconq\dotnet>exportandimport
22:38:39 Found expected organization in the "Orders" object.
22:38:39 Found expected organization in the "Order Details" object.
22:38:40 Completed OK.  Moved 2155 children and found 0 orphans.
22:38:40 WARNING: MergeAsName does not contain an [ID] or other macro.  This cou
ld lead to invalid JSON through duplicate keys in merged children!
22:38:40 Found expected organization in the "Orders" object.
22:38:40 Found expected organization in the "Employees" object.
22:38:41 Completed OK.  Moved 830 children and found 0 orphans.
22:38:41 WARNING: MergeAsName does not contain an [ID] or other macro.  This cou
ld lead to invalid JSON through duplicate keys in merged children!
22:38:42 Found expected organization in the "Orders" object.
22:38:42 Found expected organization in the "Customers" object.
22:38:43 Completed OK.  Moved 830 children and found 0 orphans.
22:38:43 WARNING: MergeAsName does not contain an [ID] or other macro.  This cou
ld lead to invalid JSON through duplicate keys in merged children!
22:38:43 Found expected organization in the "Orders" object.
22:38:43 Found expected organization in the "Products" object.
22:38:45 Completed OK.  Moved 2155 children and found 0 orphans.
22:38:45 WARNING: MergeAsName does not contain an [ID] or other macro.  This cou
ld lead to invalid JSON through duplicate keys in merged children!
22:38:46 Found expected organization in the "Orders" object.
22:38:46 Found expected organization in the "Suppliers" object.
22:38:48 Completed OK.  Moved 2155 children and found 0 orphans.
22:38:48 WARNING: MergeAsName does not contain an [ID] or other macro.  This cou
ld lead to invalid JSON through duplicate keys in merged children!
22:38:50 Found expected organization in the "Orders" object.
22:38:50 Found expected organization in the "Categories" object.
22:38:52 Completed OK.  Moved 2155 children and found 0 orphans.
22:38:53 Found expected organization in the "Orders" object.
22:38:55 Completed OK.  Moved 11620 children and found 0 orphans.
22:38:57 Completed OK.  Deleted 1 nodes.
22:38:59 Completed OK.  Deleted 9130 nodes.
22:39:01 Completed OK.  Deleted 6640 nodes.
22:39:02 Completed OK.  Deleted 43930 nodes.
22:39:03 Completed OK.  Deleted 4980 nodes.
22:39:08 Completed OK.  Did 39140 nodes.

This batch file and any of its commands can, of course, be modified to taste or to work with other datasets.

Importing Into Cassandra

If you simply run Cassandra’s JSON2SStable command you’ll see some short usage information.

C:\work\apache-cassandra-0.6.3>bin\json2sstable.bat
Missing required options: Kc
Usage: org.apache.cassandra.tools.SSTableImport -K keyspace -c column_family <j
on> <sstable>

…but please use the following procedure to properly import your JSON dataset.

First, shut down your Cassandra client and server (if started).  Then do into your Cassandra folder and open up your “conf\storage-conf.xml” file.  Add the following entry to this file and save.  (You can substitute your own name for “NorthwindOne” as long as you use it consistently below.)

<Keyspace Name="NorthwindOne">
  <ColumnFamily Name="Orders"
     CompareWith="UTF8Type"
     ColumnType="Super" CompareSubcolumnsWith="UTF8Type"
     />
   <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
   <ReplicationFactor>1</ReplicationFactor>
   <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>

Once you’ve saved this file, start the Cassandra server again.  If your configuration changes were accepted, this will create a new, empty directory in your Cassandra server’s folder store.

You should also fire up the Cassandra client to check that your new datastore is live.

C:\work\apache-cassandra-0.6.3>bin\cassandra-cli --host localhost
Starting Cassandra Client
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
cassandra> show keyspaces;
NorthwindOne
Keyspace1
system

Now, stop the Cassandra server again and shut down the Cassandra client again.  (The Cassandra client doesn’t respond well to the server going up and down.)

To properly invoke the JSON2SStable utility, use the following syntax, substituting the appropriate values and paths as necessary.

In the example below, “NorthwindOne” is the name of our keystore and must match the value we saved into the “conf\storage-conf.xml” file above.  “Orders” is the name of the new column family we will be creating and inserting our native-formatted JSON into.  The path to the “JSONout13.txt” file is, of course, the file we’re importing.  Finally, the path to the “Orders-1-Data.db” file indicates which Cassandra data file we will create.  Note that this file does not yet exist, but the rest of the path (the folder structure) must already be in place.

C:\work\apache-cassandra-0.6.3>bin\json2sstable.bat -K NorthwindOne -c Orders C
\divconq\dotnet\JSONout13.txt C:\var\lib\cassandra\data\NorthwindOne\Orders-1-D
ata.db

If this works correctly, it will take a few seconds to silently import the data and will then silently return you to the command prompt.  If you see any other output from this command, you encountered an error.

Another way to quickly confirm that data was imported successfully is to eyeball the Cassandra data directory.  This should now contain three new files: Order-1-Data.db, Order-1-Filter.db and Order-1-Index.db.

If you see entries like this, go ahead and fire up your Cassandra server and client again.

Working With Your Imported Data

Finally, it’s time to view the live data on the live Cassandra server. Try these commands first.

C:\work\apache-cassandra-0.6.3>bin\cassandra-cli --host localhost
Starting Cassandra Client
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
cassandra> show keyspaces;
NorthwindOne
Keyspace1
system
cassandra> get NorthwindOne.Orders['10778']['OrderInformation']
=> (column=ShippedDate, value=12/24/1997 12:00:00 AM, timestamp=1269842588093)
=> (column=ShipVia, value=1, timestamp=1269842588093)
=> (column=ShipRegion, value=, timestamp=1269842588093)
=> (column=ShipName, value=Berglunds snabbk?, timestamp=1269842588093)
=> (column=ShipCity, value=Lule?, timestamp=1269842588093)
=> (column=ShipAddress, value=Berguvsv?gen  8, timestamp=1269842588093)
=> (column=RequiredDate, value=1/13/1998 12:00:00 AM, timestamp=1269842588093)
=> (column=OrderDate, value=12/16/1997 12:00:00 AM, timestamp=1269842588093)
=> (column=Freight, value=6.7900, timestamp=1269842588093)
Returned 9 results.
cassandra> get NorthwindOne.Orders['10778']
=> (super_column=OrderInformation,
     (column=Freight, value=6.7900, timestamp=1269842588093)
     (column=OrderDate, value=12/16/1997 12:00:00 AM, timestamp=1269842588093)
     (column=RequiredDate, value=1/13/1998 12:00:00 AM, timestamp=1269842588093)
     (column=ShipAddress, value=Berguvsv?gen  8, timestamp=1269842588093)
     (column=ShipCity, value=Lule?, timestamp=1269842588093)
     (column=ShipName, value=Berglunds snabbk?, timestamp=1269842588093)
     (column=ShipRegion, value=, timestamp=1269842588093)
     (column=ShipVia, value=1, timestamp=1269842588093)
     (column=ShippedDate, value=12/24/1997 12:00:00 AM, timestamp=1269842588093)
)
=> (super_column=ItemEntry_1393,
     (column=Category_CategoryName, value=Seafood, timestamp=1269842588093)
     (column=Discount, value=0, timestamp=1269842588093)
     (column=ProductID, value=41, timestamp=1269842588093)
     (column=Product_ProductID, value=41, timestamp=1269842588093)
     (column=Product_ProductName, value=Jack's New England Clam Chowder, timesta
mp=1269842588093)
     (column=Product_SupplierID, value=19, timestamp=1269842588093)
     (column=Product_UnitsOnOrder, value=0, timestamp=1269842588093)
     (column=Quantity, value=10, timestamp=1269842588093)
     (column=Supplier_CompanyName, value=New England Seafood Cannery, timestamp=
1269842588093)
     (column=Supplier_Country, value=USA, timestamp=1269842588093)
     (column=Supplier_SupplierID, value=19, timestamp=1269842588093)
     (column=UnitPrice, value=9.6500, timestamp=1269842588093))
=> (super_column=Employee,
     (column=Country, value=USA, timestamp=1269842588093)
     (column=FirstName, value=Janet, timestamp=1269842588093)
     (column=LastName, value=Leverling, timestamp=1269842588093)
     (column=Title, value=Sales Representative, timestamp=1269842588093)
     (column=TitleOfCourtesy, value=Ms., timestamp=1269842588093))
=> (super_column=Customer,
     (column=CompanyName, value=Berglunds snabbk?, timestamp=1269842588093)
     (column=Country, value=Sweden, timestamp=1269842588093))
Returned 4 results.

You can pick other Order IDs and supercolumn values (e.g., “Customer”, “Employee”, various “ItemEntry_” values) to view those values too.

Next Steps

At this point you have the tools and documentation to not only import the Microsoft Northwind Access database into Cassandra, but similar databases as well. This concludes the “Migrate a Relational Database into Cassandra” series of articles.

The next set of articles will describe how to build a working application on top of Cassandra.

Troubleshooting

If you encounter errors during import, feel free to shut down the server and wipe all the data files from the folder.

Also look for entries like this in the “C:\var\log\cassandra\system.log” file; while this specific instance indicates an import problem related to importing extra data through this process, these errors are really telling you that the “Trains-2-Data.*” files are useless but that the older “Trains-1-Data.*” files are still good.

 INFO [main] 2010-09-02 21:48:20,172 SSTableReader.java (line 120) Sampling index for C:\var\lib\cassandra\data\TransSchedTwo\Trains-1-Data.db
 INFO [main] 2010-09-02 21:48:20,177 SSTableReader.java (line 120) Sampling index for C:\var\lib\cassandra\data\TransSchedTwo\Trains-2-Data.db
ERROR [main] 2010-09-02 21:48:20,183 ColumnFamilyStore.java (line 182) Corrupt file C:\var\lib\cassandra\data\TransSchedTwo\Trains-2-Data.db; skipped
 
Comments Off

Posted in Cassandra, Intermediate, JSON, Northwind

 

Resolving JSON data types in Cassandra

02 Sep

In my previous entry I ask Why does Apache Cassandra have data types? So now that the rationale has been covered, what does that mean to us?

At DivConq we like to move data around with JSON, as you can probably tell. So the question arises how do we want JSON data types to work with Cassandra?

Read the rest of this entry »

 
Comments Off

Posted in Cassandra, JSON, MUMPS

 

Cloud Security Alliance’s Certificate of Cloud Security Knowledge (CCSK) Now Available

01 Sep

Today the Cloud Security Alliance announced that their new Certificate of Cloud Security Knowledge (CCSK) was now available.   This exciting certificate tests awareness of cloud security threats and best practices for securing the cloud.  The material covered in the one-hour, 50-question examination is largely encapsulated in two documents: “Security Guidance for Critical Areas of Focus in Cloud Computing” by the Cloud Security Alliance and the European Network and Information Security Agency (ENISA) whitepaper “Cloud Computing: Benefits, Risks and Recommendations for Information Security”.

Among the companies planning to certify their employees as CCSKs are eBay, ING, Lockheed Martin, Sallie Mae, Zynga, CA, CaseCentral, HCL Technologies, Hubspan, LogLogic, Fiberlink, McAfee, Ping Identity, Novell, Qualys, Solutionary, Symantec, Trend Micro, Veracode, VeriSign, Vordel, WhiteHat Security and Zscaler.

 
Comments Off

Posted in Cloud, Regulation