RSS
 

Archive for the ‘Elastic Architecture’ Category

A new direction DivConq File Transfer

06 Sep

In the summer of 2011 I (Andy) started posting about DivConq Framework, our own little open source project. In 2011 and 2012 the focus of the framework was a Java connector to MUMPS nosql database. In 2013 my focus shifted due to customer demands and now the DivConq Framework has evolved into a File Transfer Framework.

This change is not a surprise really, since Jonathan and my professional expertise is in the file transfer industry.

Although the product is fledgling at present we believe it presents something more than a “me too”. First off it is a open source which is rare in the Enterprise class file transfer portfolio. Second the design goal is to be best of breed. Third we plan to keep it as simple as possible.

We’ll be posting more about the new Managed File Transfer (MFT) product we plan to develop so check back. In the mean time enjoy the latest demo, source code and Wiki on GitHub.

DivConqMFT on GitHub

 
Comments Off

Posted in Cloud, DivConq, Elastic Architecture, Framework, Gateway

 

Data Structures in DivConq

27 Jan

DivConq has just released an introductory presentation covering the JSON-Like data structures used within DivConq Framework.

Download PDF or View on SlideShare.

The presentation covers:
1) JSON Compatibility
2) Creating data structures
3) Accessing data structures
4) Use of dcSchema for data validation

 
Comments Off

Posted in DivConq, Elastic Architecture, Framework, MUMPS

 

Getting Connected With DivConq

13 Dec

The main purpose of our open-source DivConq Framework is to let Java developers quickly harness the power of MUMPS-compatible “NoSQL” databases. The real power of MUMPS comes not just from its flexible data structures, but in a large part from its stored procedures. To that end, DivConq Framework provides a reasonably easy and intuitive approach to utilize MUMPS stored procedures.

In the MUMPS world the stored procedures are call MUMPS routines, but for those coming from a SQL background just think “rich language for stored procedure coding”. MUMPS (M) has a rich ability for working with data structures not often found in SQL or NoSQL solutions. To learn more about coding MUMPS routines look to our Introduction to MUMPS series on this site.

This post will guide you through setting up your M and Java environments and testing connectivity.

Read the rest of this entry »

 
Comments Off

Posted in DivConq, Elastic Architecture, Framework, MUMPS

 

Distributed Relational Guidelines part I

12 Jan

In my last post I opened the subject of using relational databases in a distributed environment.

There are a number of tools out there for database replication. I’ve looked SQL Server’s merge replication, Hit Software’s DBMoto product and the open source SymmetricDS. But for this discussion I’m ignoring the existing tools and looking at what I think would make the best solution and practices.

Read the rest of this entry »

 
Comments Off

Posted in Elastic Architecture, Uncategorized

 

Skype Demonstrates Need For Distributed Version Control

30 Dec

Skype was brought to its knees just before Christmas this year, leaving many families without a line of communication they had unexpectedly come to rely on.

The root cause wasn’t poor design, wildly unexpected traffic flows or network disruption.  Instead, as detailed in this Skype CIO blog entry, the root cause was a bug in an older version of the Skype software that caused Windows nodes to crash when they encountered delayed messages.   This bug was widespread in deployed Skype nodes (up to 50% had it) and 40% of those crashed during the outage.   The loss of 20% of Skype’s traffic capacity and 25-30% of the “supernodes” used to direct Skype traffic was more than the distributed system could bear.

Skype has an auto-update feature, and the bug was fixed before the outage began, but Skype’s auto-update wasn’t fast enough to prevent it.   Skype’s CIO admitted the problem in his blog and pledged to fix it.

“We will also be reviewing our processes for providing ‘automatic’ updates to our users so that we can help keep everyone on the latest Skype software.”

To dig out from the problem, Skype staff spent most of a day putting up thousands of extra supernodes – a fix that itself would have been impossible if Skype had not been designed to be a highly distributed system with interchangeable nodes.   However, the original outage demonstrates the ongoing need for version control in distributed systems.

 
Comments Off

Posted in Elastic Architecture, Other Organizations

 

Database Challenge

25 Dec

The primary theme in the NoSQL camp is “big data” while the primary theme in the SQL camp is very strict adherence to ACID. I think both themes fall short of the true challenge we face in modern which is distributed computing, specifically multi-data center active/active applications where a failure in one center will not prevent other data centers from operation.

Many NoSQL databases have some support for database replication so at first glance NoSQL may appear to be the stronger candidate for distributed computing. This is likely further reinforced by the very strict ideals SQL developers embrace which feel limiting to distributed developers. None-the-less, I have been thinking lately about adopting RDBMS design into distributed applications.

The first challenge for a strict SQL developer is accepting that committed data will not always be the data read by an application. Imagine a distributed network of database nodes, each node with a complete copy of the database and each node is live. An application writes a new value to database node A. Soon after, before the data has replicated from node A to node B, another application updates the same record. Once the replication occurs that original update (on A) will (probably) be lost completely.

There is a desire in the SQL developer to believe that once data is committed that it will never be lost, because subsequent updates would first read the new value or because subsequent updates will update only the relevant fields.

Let’s try to get a real world example of what this problem is. Imagine a medical records database with a Patient table. We’ll look at just a few values in the database for the patient Katie Ronin.

Id       Name                     Primary Phone             Last Vitals         Blood Pressure
6388   Katie Ronin             111.222.6666              May 18, 2010     122/79

Now lets say that a few weeks ago Katie filled in a form for her insurance company and on that form she updated her phone number to 111.555.3333. Today (Dec 12, 2010) a data entry technician is entering her new phone number from the form into the database. Also today Katie saw her doctor and her vitals where taken. Her blood pressure is being entered by the nurse as 119/78.

Unfortunately both the nurse and technician enter the data at the exact same time. With a single database node, and when using transactions, it is reasonable to suppose that there will be no problem. Hopefully the technician’s update will look like this:

UPDATE Patient SET Phone='111.555.3333' WHERE Id = 6388

And the nurses update will look like this:

UPDATE Patient SET LV='Dec 12, 2010', PB='119/78' WHERE Id = 6388

So neither update will be lost, nor will the updates collide because of Atomicity and Isolation (from ACID).

But what is this happens in a distributed database? Is there are problem? Probably there is, and here is why.

  1. If we want database nodes to be live/active while isolated (e.g. due to network errors) then we will probably end up a requirement to replay transactions in any order. More on this in a future blog.
  2. If we need to replay transactions in any order then our replication mechanism will copy snapshots of entire records, not just the changed fields
  3. This means that updates can be lost, see below.

So again with our example, only this time the update from the technician is on database node 204 while the update from the nurse is on node 71. After the update, but before the replication, here is what the database for the two nodes look like:

Id       TxStamp                   Name                     Primary Phone             Last Vitals         Blood Pressure
6388   20101212T104452204  Katie Ronin             111.555.3333              May 18, 2010     122/79

And

Id       TxStamp                   Name                     Primary Phone             Last Vitals         Blood Pressure
6388   20101212T104452071  Katie Ronin             111.222.6666              Dec 12, 2010     119/78

After the replication, which again copies the entire record, the final record will be the one with the phone number update because that is the higher TxStamp. The vitals data has been lost.

I’ll go into more detail on replication issues in coming blogs, but what I’ll focus on even more are strategies to avoid these issues. Among these will be strategies for distributed database normalization and distributed aggregate data. Ultimately with some good planning we can find a workable solution to the above problem while still using RDBMS technologies.

NoSQL, such as Cassandra, would give us an advantage in the problem demonstrated above. However, it may not help with other problems we will discuss. As such I’ll be making references to NoSQL concepts when showing some solutions for SQL developers.

 
Comments Off

Posted in Elastic Architecture, nosql

 

Microsoft Announces “Orleans” – a New Cloud Framework

01 Dec

Microsoft’s eXtreme Computing Group hit an interesting ball in play with the announcement of their “Orleans” cloud framework.  In the announcing blog, the authors write:

Orleans is a software framework for building client + cloud applications. Orleans encourages use of simple concurrency patterns that are easy to understand and implement correctly, building on an actor-like model with declarative specification of persistence, replication, and consistency and using lightweight transactions to support the development of reliable and scalable client + cloud software.

The programming model advanced by Orleans is built on “grains”: small application instances that each take one set of external inputs and then concentrate on completing the task initiated by the external inputs before turning to a second set of inputs.  Grain computations are isolated, except when they commit changes to persistent storage and make them globally visible.

Basic load balancing of work is handled by the Orleans runtime, which activates grains by choosing a server from any within the available cloud, instantiating a grain, and initializing it with the grain’s persistent state.  Pointers to active grains, scalable into the billions, are maintained in a distributed directory based on technologies such as Pastry hash tables and Beehive-like caching.

Orleans’ elastic architecture explicitly handles entry-level bottleneck issues such as central databases by using data replication.  However, it eschews the “eventual consistency” model used by Cassandra and others in favor of a system of “lightweight, optimistic transactions” that provide durability and atomic persistence.

Orleans is a library written in C# that runs on the Microsoft .NET Framework 4.0.

More information is available directly from the authors in a PDF here:
http://research.microsoft.com/apps/pubs/?id=141999

 
 

Microsoft’s New Cloud Strategy: Let’s Support Java

02 Nov

OK, so there was no DivConq in April 2010, but if there was, we would have posted an article about VMForce, the Java-based strategic alliance between Salesforce.com and VMware.   This move allowed developers to host Spring- and Tomcat-based Java applications on top of (Sales)Force.com services.

There’s also Amazon’s Java option, which is essentially pull up a Linux image and run your Java apps on it – now sometimes for free.

With so much of the cloud rushing to embrace Java, Microsoft took the unusual step of promising an open Java platform on its Azure cloud in 2011 at its own PDC (as reported by mul ti ple sources).

According to eWeek’s Darryl Taft, Microsoft promises that, “this process will involve improving Java performance, Eclipse tooling and client libraries for Windows Azure. Customers can choose the Java environment of their choice and run it on Windows Azure. Improved Java Enablement will be available to customers in 2011.”

Amitabh Srivastava, senior vice president of Microsoft’s Server and Cloud Division was similarly quoted. “The further we got into this journey into the cloud, we saw that more and more people were writing cloud applications in Java.  There are three things we need to do. One is tooling; we’re going to make the whole Eclipse integration with Azure be first class. Second is we’re going to expose the APIs in Windows Azure in Java. And third we’re investing in optimizing the performance of Java applications on Windows Azure.”

Java in “the .NET cloud”?  Of course, Java’s been supported in Azure for a long time, but it’s certainty not been accorded first class status.  TheRegister’s Gavin Clarke wonders if a race to the bottom in price, as well as developer accessibility, was the real driver behind this unusual move.

What’s also interesting to long time developers was that “Visual Studio” wasn’t mention in the same breath as”Eclipse”, leaving one to wonder if the “Eclipse tooling” represents a new frontier in Microsoft’s vaunted “embrace and extend” strategy.

 
Comments Off

Posted in Amazon EC2, Azure, Cloud, Elastic Architecture

 

Intel launches bizarre “Open Data Center Alliance”

28 Oct

In April Intel acquired McAfee – the “Avis” of the anti-virus world to Symantec’s “Hertz” – for $7.7 billion dollars.    The general response in the IT community was “WTF?

Now, Intel may have done it again by announcing  an “Open Data Center Alliance” (ODCA) that’s all about the cloud…without any support from the cloud vendor community.

“Vendors will not be members,” said Alliance steering committee member Mario Muller.

Intel’s ODCA has some laudable goals, including “federation” of cloud technology through common standard and the avoidance of vendor lock-in.  It also advocates automatic and intelligent scaling of elastic resources – akin to the “elastic architecture” we advocate on this blog.

However, without any technology or cloud services to back it up, the ODCA initiative comes across as a half-hearted “Intel Inside 2.0″ – maybe even the beginning of the end of a brand that rose with the PC-based datacenter and may fall with the cloud.

According to a recent TheRegister article, Kirk Skaugen, general manager of Intel’s data center group indicated that Amazon and other large cloud outfits have been asked to join ODCA, and he admitted that “there’s absolutely no way we can get to where we get where we want to be without [the big-name cloud companies].”

So how far has Intel fallen that they can announce a party promising $50 billion of captive IT spending door prizes and get stiffed by every major cloud vendor?   Maybe it was the guest list, but I don’t think so this time.    For the ODCA to succeed, Intel needs a strategic partner or two and they need them quickly.

 
 

Stored Procedures in Cassandra

18 Sep

It is a much discussed topic whether or not business logic should be present within a database. Academically the preference seems to be to separate the layers. However, one of the best performing applications I ever worked on placed the lion’s share of the business logic within the database – that was with another nosql database, MUMPS.

Out in the wild the reality is that there are many variations – sometimes business logic is distributed across many servers, sometimes it is all on one server. And sometimes stored procedures are used but contain a minimum of business logic and instead enhance the efficiency of data storage or retrieval.

It is my opinion that Cassandra should support an easy to use model of stored procedures and let the user decide the level of appropriateness for their software needs. One of the concerns often associated with stored procedures is the lack of portability amongst database vendors and the problem of database vendor lock-in. With Cassandra you have that concern with the data model anyway, so an investment in Cassandra suggests you may wish try to get the most out of what the software has to offer.

Another challenge of stored procedures is the often complex and unfamiliar language syntax/api. I find TSQL fairly reasonable to work with, for example, but it can be a jolt to work with it if you spend much time coding in Java or C#. The stored procedure solution we are developing for Cassandra will leverage the familiarity of Javascript.

With the emergence of Web 2.0 and HTML 5, Javascript is playing an ever more important role on the client. Along with this is a revived interest in Javascript on the server. With our enhancements to Cassandra we’ll offer a complete end-to-end model of developing applications using exclusively Javascript. Indeed, by blending the differences between web services and stored procedures our enhancements will provide a efficient and concise programming model that supports very high performance code.

And lets not forget about distributed computing, it is our primary focus after all. An obvious, but still beautiful, outcome of using Cassandra to *store* the stored procedures and web services (and web sites) is that you do not have to replicate your code across all the nodes – Cassandra already does that. Updating your web service or stored procedure is as simple as a single update or insert call to Cassandra.

What’s more is that we will provide the ability to version your stored procedures and web services. With this will come the ability to select which version is active. This feature enables a complete roll-out (replication) of a new version while the old version is still running, then flip a switch and all the nodes will start using the new code. Flip a switch again if you need to revert to the old version.

The addition of stored procedures and web services will open a lot of new possibilities with Cassandra and we are excited to be innovating in this area. A strong database is the key to a strong distributed system.

 
Comments Off

Posted in Cassandra, Elastic Architecture, MUMPS, nosql