This article is about a DivConq utility that denormalizes JSON structures by consolidating two object trees at a time.
Downloading and Installing
This utility was written in C# and cross-compiled in for .NET and Mono. These compilations are identical except:
- The .NET compilation supports console text colors – the Mono compilation does not
- The Mono compilation does not (yet) support “utf32″ encoding
To get the .NET executables, download this URL:
To get the Mono executables, download this URL:
After downloading this Zip file, unpack it into a local directory and run your choice of “dotnet” or “mono” executable (each is unpacked to a folder of that name).
ConsolidateJSONObjects v1.0 - July 26, 2010 Denormalizes/distributes one JSON object into another. Please visit http://www.divconq.com for more information. This utility uses the Procurios C#.NET JSON library under the MIT license. Usage: ConsolidateJSONObjects InputFile OutputFile MergeIntoObject MergeFromObject MergeAsName ForeignKey [DeleteFrom] [FormatHack] [DebugLevel] ...where: InputFile = name or path of input file containing original JSON OutputFile = name or path of output file to write final JSON into MergeIntoObject = name of JSON object into which object will be merged MergeFromObject = name of JSON object to read data from for merge MergeAsName = name template of new grandchild objects created in the MergeIntoObject - this parameter uses a [ID] macro, e.g., car_[ID] ForeignKey = name of column which contains the foreign key value used to connect MergeFromObject children with MergeToObject children DeleteFrom = Set to NO to preserve MergeFromObject in OutputFile Default is YES, which causes MergeFromObject to be deleted. FormatHack = Set to NO to prevent utility from injecting CRLF to make the output prettier. Default is YES. DebugLevel = Set to 1 to see only warnings, set to 0 to hide those too. Default is 2. Set to 3 to see a little more information. Example: (the following two invocations do exactly the same thing) ConsolidateJSONObjects forestandtrees.txt justforests.txt forests famoustrees famoustree_[ID] forestID ConsolidateJSONObjects forestandtrees.txt justforests.txt forests famoustrees famoustree_[ID] forestID YES YES 2
Short Example of Use
Now is a good time to go back and re-read the “Migrate a Relational Database Structure into a NoSQL Cassandra Structure (Part I)” article. This is exactly what we’re going to do with the ConsolidateJSONObjects utility in this example.
To get started, download the input data file containing the “forests” and “famoustrees” object trees from this URL. To conserve keystrokes, save this file in the same directory as either your mono or dotnet edition of ConsolidateJSONObjects.exe.
Next, open a console or terminal and CD into the directory with your “ConsolidateJSONObjects.exe” executable and the “forestandtrees.txt” file you just downloaded. Run “ConsolidateJSONObjects.exe” without any arguments – expect to see the console help for the utility displayed.
Now let’s try it again with all six of the required arguments filled out:
- Input and output JSON files: pretty self-explanatory
- The “MergeInto” object is where we want all our data to end up. In this case it’s the “forests” object.
- The “MergeFrom” object is the object we want to collapse. In this case it’s the “famoustrees” object.
- “MergeAsName” is completely up to you: it determines the name/key of the new object added to each child of the MergeInto object. Unless you’re merging “1:1″ object relationships, this value should contain the string “[ID]”, which will help ensure that object names/keys rename unique in their parent nodes.
- “ForeignKey” is the name/key of existing grandchildren of the MergeFrom object and is used to link the children of the MergeInto object with the children from the MergeFrom object
The expected output from this command is shown below. Notice the one warning – this indicates that there was a “famoustree” that could not be chained up to specific “forest”.
00:10:38 Found expected organization in the “forests” object.
00:10:38 Found expected organization in the “famoustrees” object.
00:10:38 WARNING: tree23456 Foreign Key=forest002 not found in target JSON struc
00:10:38 Completed OK. Moved 3 children and found 1 orphans.
Open the “JSONout.txt” file to view the output. You’ll notice that it’s not as pretty as my hand-formatted input, but the utility revalidates output JSON before writing it so at least you know it’s real JSON.
Take a close look at the “Black Forest” forest entry and notice that it’s grown two “famoustree_*” children, each with a unique name derived from it’s unique key from the old famoustree object. Also notice that the whole famoustree object is missing from the output – even the entry that generated the warning because it couldn’t be matched was deleted.
If you open the original “forestandtrees.txt” JSON file you should be able to find the source of the warning by searching for the value of “tree23456″. You can practice taking out this node or adding a stub for the proper forests entry to see how the utility responds, or test it with other data of your choice.
Once you are comfortable with the way the basic six parameters work and consolidation happens, move on and experiment with the three optional parameters.
First, add a value of “NO” to the end of your six values in the seventh slot (the “DeleteFrom” parameter) to prevent the MergeFrom object (e.g., “famoustrees”) from being deleted. You might use this in a denormalization scenario where you need to consolidate multiple sets of objects (though multiple invocations of this utility) and you need to keep the MergeFrom object around until the final merge.
Next, add a second value of “NO” in the eighth slot (the “FormatHack ” parameter) to prevent the utility from trying to add some whitespace and line feeds to your JSON output. You would use this if the utility’s attempts to make the JSON output pretty were messing up the data in the JSON output or yielding invalid JSON. (The format hack also incurs a small, usually negligible performance hit; you could use this flag to avoid that too.)
Finally, try different values from 0-3 in the ninth slot (the “DebugLevel” parameter). Smaller values yield less output and the utility is currently not coded to care about any value larger than 3.
The next article in this series will show you how to use this utility and others like it to prepare JSON output from the Northwind relational database for import into Cassandra.
Published on Wednesday July 28, 2010 at 09:35am