About

JSON (Java Script Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It can be used as a data interchange format, just like XML. When comparing JSON to XML, it has several advantages over the last one. JSON is really simple, it has a self-documenting format, it is much shorter because there is no data configuration overhead. That is why JSON is considered a fat-free alternative to XML.

However, the purpose of this post is not to discuss the pros and cons of JSON over XML. Though it is one of the most used data interchanged format, there is still room for improvement. For instance, JSON uses excessively quotes and key names are very often repeated. This problem can be solved by JSON compression algorithms. There are more than one available. Here you'll find an analysis of two JSON compressors algorithms and a conclusion whether JSON compression is useful and when it should be used.

Compressing JSON with CJSON algorithm

CSJON compress the JSON with automatic type extraction. It tackles the most pressing problem: the need to constantly repeat key names over and over. Using this compression algorithm, the following JSON:

[
  { // This is a point
    "x": 100, 
    "y": 100
  }, { // This is a rectangle
    "x": 100, 
    "y": 100,
    "width": 200,
    "height": 150
  },
  {}, // an empty object
]

Can be compressed as:

{
  "templates": [ 
    [0, "x", "y"], [1, "width", "height"] 
  ],
  "values": [ 
    { "values": [ 1,  100, 100 ] }, 
    { "values": [2, 100, 100, 200, 150 ] }, 
    {} 
  ]
}

The more detailed description of the compression algorithm, along with the source code can be found here:

Compressing JSON with HPack algorithm

JSON.hpack is a lossless, cross language, performances focused, data set compressor. It is able to reduce up to 70% number of characters used to represent a generic homogeneous collection. This algorithms provides several level of compression (from 0 to 4). The level 0 compression performs the most basic compression by removing keys (property names) from the structure creating a header on index 0 with each property name. Next levels make it possible to reduce even more the size of the JSON by assuming that there are duplicated entries.

For the following JSON:

[{
  name : "Andrea",
  age : 31,
  gender : "Male",
  skilled : true
}, {
  name : "Eva",
  age : 27,
  gender : "Female",
  skilled : true
}, {
  name : "Daniele",
  age : 26,
  gender : "Male",
  skilled : false
}]

the hpack algorithm produces a compressed version which looks like this:

[["name","age","gender","skilled"],["Andrea",31,"Male",true],["Eva",27,"Female",true],["Daniele",26,"Male",false]]

More details about hpack algorithm can be found at project home page.

Analysis

The purpose of this analysis is to compare each of the described JSON compressor algorithms. For this purpose we will use 5 files with JSON content having different dimensions, varying from 50K to 1MB. Each JSON file will be served to a browser using a servlet container (tomcat) with the following transformations:

Unmodified JSON - no change on the server side
Minimized JSON - remove whitespaces and new lines (most basic js optimization)
Compressed JSON using CJSON algorithm
Compressed JSON using HPack algorithm
Gzipped JSON - no change on the server side
Gzipped and minimized JSON
Gzipped and compressed using CJSON algorithm
Gzipped and compressed using HPack algorithm

Results

This table contains the results of the benchmark. Each row of the table contains one of the earlier mentioned transformation. The table has 5 columns, one for each JSON file we process.
\

	json1	json2	json3	json4	json5
Original JSON size (bytes)	52966	104370	233012	493589	1014099
Minimized	33322	80657	180319	382396	776135
Compress CJSON	24899	48605	108983	231760	471230
Compress HPack	5727	10781	23162	49099	99575
Gzipped	2929	5374	11224	23167	43550
Gzipped and Minimized	2775	5035	10411	21319	42083
Gzipped and compressed with CJSON	2568	4605	9397	19055	37597
Gzipped and compressed with HPack	1982	3493	6981	13998	27358

Relative size of transformations(%)

The relative size of transformation graphic is useful to see if the size of the json to compress affects the efficiency of compression or minimization. You can notice the following:

the minimization is much more efficient for smaller files. (~60%)
for large and very large json files, the minimization has constant efficiency (~75%)
compressors algorithms has the same efficency for any size of json file
CJson compressing algorithm is less efficient (~45%) than hpack algorithm (~8%)
CJson compressing algorithm is slower than hpack algorihtm
Gzipped content has almost the same size as the compressed content
Combining compression with gzip or minimization with gzip, doesn't improve significantly efficiency (only about 1-2%)

Conclusion

Both JSON compression algorithms are supported by wro4j since version 1.3.8 by the following processors: CJsonProcessor & JsonHPackProcessor. Both of them provide the following methods: pack & unpack. The underlying implementation uses Rhino engine to run the javascript code on the serverside.

JSON Compression algorithms considerably reduce json file size. There a several compression algorithms. We have covered two of them: CJson and HPack. HPack seems to be much more efficient than CJson and also significantly faster. When two entities exchange JSON and the source compress it before it reach the target, the client (target) have to apply the inverse operation of compression (unpacking), otherwise the JSON cannot be used. This introduce a small overhead which must be taken into account when deciding if JSON compression should be used or not.

When gziping of content is allowed, it has a better efficiency than any other compression algorithm. In conclusion, it doesn't worth to compress a JSON on the server if the client accept the gzipped content. The compression on the server-side does make sense when the client doesn't know how to work with gzipped content and it is important to keep the traffic volue as low as possible (due to cost and time).

Another use-case for JSON compression algorithm is sending a large JSON content from client to server (which is sent ungzipped). In this case, it is important to unpack the JSON content on the server before consuming it.

Web Resource Optimization

Sunday, June 26, 2011

JSON Compression algorithms