RSS
 

Utility to Denormalize JSON Structures by Consolidating Two Object Trees

28 Jul

This article is about a DivConq utility that denormalizes JSON structures by consolidating two object trees at a time.

If you need to learn about JSON first, start here.  If you’re not sure why you’d want to denormalize JSON structures, read this article first.

Downloading and Installing

This utility was written in C# and cross-compiled in for .NET and Mono.  These compilations are identical except:

  • The .NET compilation supports console text colors – the Mono compilation does not
  • The Mono compilation does not (yet) support “utf32″ encoding

To get the .NET executables, download this URL:
http://www.divconq.com/wp-content/uploads/2013/05/JSONUtils_dotnet_v1001.zip

To get the Mono executables, download this URL:
http://www.divconq.com/wp-content/uploads/2013/05/JSONUtils_mono_v1001.zip

After downloading this Zip file, unpack it into a local directory and run your choice of “dotnet” or “mono” executable (each is unpacked to a folder of that name).

Command Syntax

ConsolidateJSONObjects v1.0 - July 26, 2010
  Denormalizes/distributes one JSON object into another.
  Please visit http://www.divconq.com for more information.
  This utility uses the Procurios C#.NET JSON library under the MIT license.

  Usage:
   ConsolidateJSONObjects InputFile OutputFile MergeIntoObject MergeFromObject
     MergeAsName ForeignKey [DeleteFrom] [FormatHack] [DebugLevel]
     ...where:
       InputFile = name or path of input file containing original JSON
       OutputFile = name or path of output file to write final JSON into
       MergeIntoObject = name of JSON object into which object will be merged
       MergeFromObject = name of JSON object to read data from for merge
       MergeAsName = name template of new grandchild objects created in the
           MergeIntoObject - this parameter uses a [ID] macro, e.g., car_[ID]
       ForeignKey = name of column which contains the foreign key value used
           to connect MergeFromObject children with MergeToObject children
       DeleteFrom = Set to NO to preserve MergeFromObject in OutputFile
           Default is YES, which causes MergeFromObject to be deleted.
       FormatHack = Set to NO to prevent utility from injecting CRLF to
           make the output prettier.  Default is YES.
       DebugLevel = Set to 1 to see only warnings, set to 0 to hide those
           too.  Default is 2.  Set to 3 to see a little more information.

  Example: (the following two invocations do exactly the same thing)
   ConsolidateJSONObjects forestandtrees.txt justforests.txt forests
     famoustrees famoustree_[ID] forestID
   ConsolidateJSONObjects forestandtrees.txt justforests.txt forests
     famoustrees famoustree_[ID] forestID YES YES 2

Short Example of Use

Now is a good time to go back and re-read the “Migrate a Relational Database Structure into a NoSQL Cassandra Structure (Part I)” article.  This is exactly what we’re going to do with the ConsolidateJSONObjects utility in this example.

To get started, download the input data file containing the “forests” and “famoustrees” object trees from this URL.  To conserve keystrokes,  save this file in the same directory as either your mono or dotnet edition of ConsolidateJSONObjects.exe.

Next, open a console or terminal and CD into the directory with your  “ConsolidateJSONObjects.exe” executable and the “forestandtrees.txt” file you just downloaded.  Run “ConsolidateJSONObjects.exe” without any arguments – expect to see the console help for the utility displayed.

Now let’s try it again with all six of the required arguments filled out:

  • Input and output JSON files: pretty self-explanatory
  • The “MergeInto” object is where we want all our data to end up.  In this case it’s the “forests” object.
  • The “MergeFrom” object is the object we want to collapse.  In this case it’s the “famoustrees” object.
  • “MergeAsName” is completely up to you: it determines the name/key of the new object added to each child of the MergeInto object.  Unless you’re merging “1:1″ object relationships, this value should contain the string “[ID]”, which will help ensure that object names/keys rename unique in their parent nodes.
  • “ForeignKey” is the name/key of existing grandchildren of the MergeFrom object and is used to link the children of the MergeInto object with the children from the MergeFrom object

The expected output from this command is shown below.  Notice the one warning – this indicates that there was a “famoustree” that could not be chained up to specific “forest”.

C:\jsontools\dotnet>consolidatejsonobjects forestandtrees.txt JSONout.txt forests famoustrees famoustree_[ID] forestID
00:10:38 Found expected organization in the “forests” object.
00:10:38 Found expected organization in the “famoustrees” object.
00:10:38 WARNING: tree23456 Foreign Key=forest002 not found in target JSON struc
ture!
00:10:38 Completed OK. Moved 3 children and found 1 orphans.

Open the “JSONout.txt” file to view the output.  You’ll notice that it’s not as pretty as my hand-formatted input, but the utility revalidates output JSON before writing it so at least you know it’s real JSON.

Take a close look at the “Black Forest” forest entry and notice that it’s grown two “famoustree_*” children, each with a unique name derived from it’s unique key from the old famoustree object.  Also notice that the whole famoustree object is missing from the output – even the entry that generated the warning because it couldn’t be matched was deleted.

If you open the original “forestandtrees.txt” JSON file you should be able to find the source of the warning by searching for the value of “tree23456″.  You can practice taking out this node or adding a stub for the proper forests entry to see how the utility responds, or test it with other data of your choice.

Once you are comfortable with the way the basic six parameters work and consolidation happens, move on and experiment with the three optional parameters.

First, add a value of “NO” to the end of your six values in the seventh slot (the “DeleteFrom” parameter) to prevent the MergeFrom object (e.g., “famoustrees”) from being deleted.  You might use this in a denormalization scenario where you need to consolidate multiple sets of objects (though multiple invocations of this utility) and you need to keep the MergeFrom object around until the final merge.

Next, add a second value of “NO” in the eighth slot (the “FormatHack ” parameter) to prevent the utility from trying to add some whitespace and line feeds to your JSON output.  You would use this if the utility’s attempts to make the JSON output pretty were messing up the data in the JSON output or yielding invalid JSON.  (The format hack also incurs a small, usually negligible performance hit; you could use this flag to avoid that too.)

Finally, try different values from 0-3 in the ninth slot (the “DebugLevel” parameter).  Smaller values yield less output and the utility is currently not coded to care about any value larger than 3.

Next Steps

The next article in this series will show you how to use this utility and others like it to prepare JSON output from the Northwind relational database for import into Cassandra.

About Jonathan Lampe

Author of 44 articles on this blog.

I have about 12 years of experience designing secure, partially distributed systems (e.g., web farm with some extra load in one remote data center), often in industries such as finance and defense. My solutions and software are currently deployed in mission-critical roles at about 1000 enterprises worldwide. In the last 2-3 years I have turned my technical attention toward more geographically distributed systems and heterogeneous environments (e.g., a mix of operating systems or on-premises and cloud deployments) while maintaining my focus on good UX and great security.

Published on Wednesday July 28, 2010 at 09:35am

 
Comments Off

Posted in Intermediate, JSON

 

Comments are closed.