Monthly Archives: December 2010

Skype Demonstrates Need For Distributed Version Control

Skype was brought to its knees just before Christmas this year, leaving many families without a line of communication they had unexpectedly come to rely on.

The root cause wasn’t poor design, wildly unexpected traffic flows or network disruption.  Instead, as detailed in this Skype CIO blog entry, the root cause was a bug in an older version of the Skype software that caused Windows nodes to crash when they encountered delayed messages.   This bug was widespread in deployed Skype nodes (up to 50% had it) and 40% of those crashed during the outage.   The loss of 20% of Skype’s traffic capacity and 25-30% of the “supernodes” used to direct Skype traffic was more than the distributed system could bear.

Skype has an auto-update feature, and the bug was fixed before the outage began, but Skype’s auto-update wasn’t fast enough to prevent it.   Skype’s CIO admitted the problem in his blog and pledged to fix it.

“We will also be reviewing our processes for providing ‘automatic’ updates to our users so that we can help keep everyone on the latest Skype software.”

To dig out from the problem, Skype staff spent most of a day putting up thousands of extra supernodes – a fix that itself would have been impossible if Skype had not been designed to be a highly distributed system with interchangeable nodes.   However, the original outage demonstrates the ongoing need for version control in distributed systems.


Posted by on 2010-Dec-12 in Elastic Architecture, Other Organizations

Comments Off on Skype Demonstrates Need For Distributed Version Control

Database Challenge

The primary theme in the NoSQL camp is “big data” while the primary theme in the SQL camp is very strict adherence to ACID. I think both themes fall short of the true challenge we face in modern which is distributed computing, specifically multi-data center active/active applications where a failure in one center will not prevent other data centers from operation.

Many NoSQL databases have some support for database replication so at first glance NoSQL may appear to be the stronger candidate for distributed computing. This is likely further reinforced by the very strict ideals SQL developers embrace which feel limiting to distributed developers. None-the-less, I have been thinking lately about adopting RDBMS design into distributed applications.

The first challenge for a strict SQL developer is accepting that committed data will not always be the data read by an application. Imagine a distributed network of database nodes, each node with a complete copy of the database and each node is live. An application writes a new value to database node A. Soon after, before the data has replicated from node A to node B, another application updates the same record. Once the replication occurs that original update (on A) will (probably) be lost completely.

There is a desire in the SQL developer to believe that once data is committed that it will never be lost, because subsequent updates would first read the new value or because subsequent updates will update only the relevant fields.

Let’s try to get a real world example of what this problem is. Imagine a medical records database with a Patient table. We’ll look at just a few values in the database for the patient Katie Ronin.

Id       Name                     Primary Phone             Last Vitals         Blood Pressure
6388   Katie Ronin             111.222.6666              May 18, 2010     122/79

Now lets say that a few weeks ago Katie filled in a form for her insurance company and on that form she updated her phone number to 111.555.3333. Today (Dec 12, 2010) a data entry technician is entering her new phone number from the form into the database. Also today Katie saw her doctor and her vitals where taken. Her blood pressure is being entered by the nurse as 119/78.

Unfortunately both the nurse and technician enter the data at the exact same time. With a single database node, and when using transactions, it is reasonable to suppose that there will be no problem. Hopefully the technician’s update will look like this:

UPDATE Patient SET Phone='111.555.3333' WHERE Id = 6388

And the nurses update will look like this:

UPDATE Patient SET LV='Dec 12, 2010', PB='119/78' WHERE Id = 6388

So neither update will be lost, nor will the updates collide because of Atomicity and Isolation (from ACID).

But what is this happens in a distributed database? Is there are problem? Probably there is, and here is why.

  1. If we want database nodes to be live/active while isolated (e.g. due to network errors) then we will probably end up a requirement to replay transactions in any order. More on this in a future blog.
  2. If we need to replay transactions in any order then our replication mechanism will copy snapshots of entire records, not just the changed fields
  3. This means that updates can be lost, see below.

So again with our example, only this time the update from the technician is on database node 204 while the update from the nurse is on node 71. After the update, but before the replication, here is what the database for the two nodes look like:

Id       TxStamp                   Name                     Primary Phone             Last Vitals         Blood Pressure
6388   20101212T104452204  Katie Ronin             111.555.3333              May 18, 2010     122/79


Id       TxStamp                   Name                     Primary Phone             Last Vitals         Blood Pressure
6388   20101212T104452071  Katie Ronin             111.222.6666              Dec 12, 2010     119/78

After the replication, which again copies the entire record, the final record will be the one with the phone number update because that is the higher TxStamp. The vitals data has been lost.

I’ll go into more detail on replication issues in coming blogs, but what I’ll focus on even more are strategies to avoid these issues. Among these will be strategies for distributed database normalization and distributed aggregate data. Ultimately with some good planning we can find a workable solution to the above problem while still using RDBMS technologies.

NoSQL, such as Cassandra, would give us an advantage in the problem demonstrated above. However, it may not help with other problems we will discuss. As such I’ll be making references to NoSQL concepts when showing some solutions for SQL developers.


Posted by on 2010-Dec-12 in Elastic Architecture, nosql

Comments Off on Database Challenge

Microsoft MapReduce = Dryad

Microsoft has let it be known that its new “Dryad” technology, aimed at providing similar functionality to the Hadoop MapReduce we’re familiar with in Cassandra-land, is moving forward.  In addition, Microsoft’s DryadLINQ layer will soon aim at higher level functionality currently found in Pig and Hive above that.

This isn’t the first time Microsoft has talked about this technology.  That would have been the 2007 research paper from Microsoft Research in “Silicon Valley”.   In that paper the distributed architecture behind the system was laid out.  That architecture includes a “job manager” that schedules and coordinates work on many distributed platforms through a “control plane” (we think they meant “bus”).

The most important conceptual elements in Dryad are a “vertex” and a “graph”.  A graph represents a single job and is made up of one or more vertices which each represent a workflow element, or program.  The use of visual-oriented language is no accident.  Instead, designers are encouraged to sketch out their computations and merges as if working on a whiteboard.  These sketches are then translated into Dryad’s language where the consistent use of design terminology throughout encourages code fidelity to the sketch.

With Dryad just entering beta and Cassandra rounding the horn on its .7 version, it should be an interesting year ahead for distributed data architectures.


Posted by on 2010-Dec-12 in Hadoop and Pig

Comments Off on Microsoft MapReduce = Dryad

Cassandra = Long Recovery Time?

A new interview with Facebook “infrastructure guru” Karthik Ranganathan on The Register (Facebook: Why our ‘next-gen’ comms ditched MySQL) adds fuel to a simmering reliability argument around Cassandra.

The Register writes: 

“For many, it’s surprising that Facebook didn’t put (its next-gen) messaging system on Cassandra, which was originally developed at the company and has since become hugely popular across the interwebs. But Ranganathan and others felt that it was too difficult to reconcile Cassandra’s ‘eventual consistency model’ with the messaging system setup.”

If you’re reading this site, you probably already know that Cassandra’s “eventual consistency model” is one that says, in quiet world, all nodes will “eventually” get all updates from all other nodes  and will the entire dataset will be “consistent” across all nodes.  Of course, no system is really quiet, so there’s some lag in the system, but reliability is generally good.

So what did Facebook use instead for this?  HBase, the open source distributed database modeled after Google’s proprietary BigTable platform.  Ranganathan continues:

“For a product like messaging, we needed a strong consistency model [as HBase offers]. If a user sends an email, he says, you have to be able to tell the user – immediately the email was sent. If it doesn’t show that it’s been sent, I think ‘Oh, I didn’t send it,’…”

Of course, this isn’t a problem if the same application that sent the message and/or its cache is maintaining the fact that the message was just sent, but it appears that Facebook wants to be free of that assumption.

But other than that, Cassandra’s still the cat’s pajama’s, right Karthik?

“(Ranganathan) also felt that the system needed HBase physical replication as opposed to Cassandra’s logical replication. With Cassandra, he says, if something gets corrupted and you lose a disk and you have to restore the data, you have to restore the entire replica. Because Facebook uses very dense machines, this would mean a very long recovery time.”

Now that’s a shot at the core of Cassandra’s eventually consistent model.  Who answered the challenge on behalf of Cassandra?

“Some Cassandra backers don’t see it that way.”

In other words…no one.  This is where folks like Riptano, a venture capitalist-backed commercial company who depends on Cassandra’s commercial success, or the leaders of the Apache Cassandra project need to stand up and make their voices heard.

Let’s hope Cassandra’s forthcoming 0.7 release gets a better reception in the press.


Posted by on 2010-Dec-12 in Cassandra, Uncategorized

Comments Off on Cassandra = Long Recovery Time?

Microsoft Partners Are Revolting

About a year ago a well-placed* friend of mine returned from a Microsoft conference for its closest partners with an interesting story.  He was talking about a keynote in which Microsoft was announcing its plans to switch from the on-premises Exchange/Server/SQLServer deployment model Microsoft Certified loyalists depended on to a subscription model based on its new cloud services.  At the time the Microsoft Certified loyalists were less than impressed.  A typical recounting of a major partner’s question was, “OK, so we refer our loyal customers to your cloud infrastructure, and all we get is a lousy finder’s fee?”

Fast forward to the 2010 Worldwide Partner conference: even more partners get the message, including a few willing to air their gripes to The Register.

“There is support…, there are additional service offerings, but they are very small compared to the profitability of installing a new server and maintaining the client environment,” said Steve Rudd, head of business development at ITRM, a Microsoft Gold Partner with offices in Sidcup and London.

The party line here is that it’s ALL going to go to the cloud, with training, migration services and business process integration becoming the new partner revenue drivers for the nest few years.

“We’re seeing an upside as customers see the price and say right, now I’ve got some budget for training. If your value proposition is to set up and configure the servers and the anti-virus, you won’t see as much of that in the future. The balance of services is changing from setup and configuration to more business-value services, value creation services,” said Tim Wallis, co-founder of London-based Content and Code, a large Microsoft partner.

What I find odd in all the “on premises is dead, long live the cloud” hoopla was that no one was pitching a permanent hybrid model with some things in the cloud and others not.  Even Google subscribes to that.

DivConq’s position is that the optimal end state for most enterprises will be is an application platform built on extremely scalable elastic architecture that provides  cloud escrow (the ability to easily switch your deployment model if requirements change).

Despite the best efforts of Microsoft** and other cloud providers, developing and maintaining applications that fulfill the cloud escrow requirement of enterprise systems still requires careful project planning, visionary architecture, careful migration, well-trained coders and regular testing, plus an allotment of local development, test and alternate production servers.  Just fulling all roles those should allow many Microsoft Partners to continue to thrive with minimal changes to their existing service offerings.  (Those who don’t offer any of these services today might as well drop the “V” from their “VAR” designation now.)

Yes, the days of Exchange markup gravy may be over, but there’s plenty of new challenges ahead to support emerging technologies…and those challenges aren’t really that different than those we had 5, 10 or 15 years ago.

* = Fortune 500 Strategic IT VP

** = See Microsoft’s recent efforts to unify SQL Server interfaces in cloud and on-premises deployments, the Orleans Framework, etc.


Posted by on 2010-Dec-12 in Uncategorized

Comments Off on Microsoft Partners Are Revolting

Microsoft Announces “Orleans” – a New Cloud Framework

Microsoft’s eXtreme Computing Group hit an interesting ball in play with the announcement of their “Orleans” cloud framework.  In the announcing blog, the authors write:

Orleans is a software framework for building client + cloud applications. Orleans encourages use of simple concurrency patterns that are easy to understand and implement correctly, building on an actor-like model with declarative specification of persistence, replication, and consistency and using lightweight transactions to support the development of reliable and scalable client + cloud software.

The programming model advanced by Orleans is built on “grains”: small application instances that each take one set of external inputs and then concentrate on completing the task initiated by the external inputs before turning to a second set of inputs.  Grain computations are isolated, except when they commit changes to persistent storage and make them globally visible.

Basic load balancing of work is handled by the Orleans runtime, which activates grains by choosing a server from any within the available cloud, instantiating a grain, and initializing it with the grain’s persistent state.  Pointers to active grains, scalable into the billions, are maintained in a distributed directory based on technologies such as Pastry hash tables and Beehive-like caching.

Orleans’ elastic architecture explicitly handles entry-level bottleneck issues such as central databases by using data replication.  However, it eschews the “eventual consistency” model used by Cassandra and others in favor of a system of “lightweight, optimistic transactions” that provide durability and atomic persistence.

Orleans is a library written in C# that runs on the Microsoft .NET Framework 4.0.

More information is available directly from the authors in a PDF here:


Posted by on 2010-Dec-12 in Cloud, Elastic Architecture, Orleans, Other Organizations

Comments Off on Microsoft Announces “Orleans” – a New Cloud Framework