About
JSON (Java Script Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It can be used as a data interchange format, just like XML. When comparing JSON to XML, it has several advantages over the last one. JSON is really simple, it has a self-documenting format, it is much shorter because there is no data configuration overhead. That is why JSON is considered a fat-free alternative to XML.
However, the purpose of this post is not to discuss the pros and cons of JSON over XML. Though it is one of the most used data interchanged format, there is still room for improvement. For instance, JSON uses excessively quotes and key names are very often repeated. This problem can be solved by JSON compression algorithms. There are more than one available. Here you'll find an analysis of two JSON compressors algorithms and a conclusion whether JSON compression is useful and when it should be used.
Compressing JSON with CJSON algorithm
CSJON compress the JSON with automatic type extraction. It tackles the most pressing problem: the need to constantly repeat key names over and over. Using this compression algorithm, the following JSON:
[
{ // This is a point
"x": 100,
"y": 100
}, { // This is a rectangle
"x": 100,
"y": 100,
"width": 200,
"height": 150
},
{}, // an empty object
]
Can be compressed as:
{
"templates": [
[0, "x", "y"], [1, "width", "height"]
],
"values": [
{ "values": [ 1, 100, 100 ] },
{ "values": [2, 100, 100, 200, 150 ] },
{}
]
}
The more detailed description of the compression algorithm, along with the source code can be found
here:
Compressing JSON with HPack algorithm
JSON.hpack is a lossless, cross language, performances focused, data set compressor. It is able to reduce up to 70% number of characters used to represent a generic homogeneous collection.
This algorithms provides several level of compression (from 0 to 4). The level 0 compression performs the most basic compression by removing keys (property names) from the structure creating a header on index 0 with each property name. Next levels make it possible to reduce even more the size of the JSON by assuming that there are duplicated entries.
For the following JSON:
[{
name : "Andrea",
age : 31,
gender : "Male",
skilled : true
}, {
name : "Eva",
age : 27,
gender : "Female",
skilled : true
}, {
name : "Daniele",
age : 26,
gender : "Male",
skilled : false
}]
the hpack algorithm produces a compressed version which looks like this:
[["name","age","gender","skilled"],["Andrea",31,"Male",true],["Eva",27,"Female",true],["Daniele",26,"Male",false]]
More details about hpack algorithm can be found at project
home page.
Analysis
The purpose of this analysis is to compare each of the described JSON compressor algorithms. For this purpose we will use 5 files with JSON content having different dimensions, varying from 50K to 1MB. Each JSON file will be served to a browser using a servlet container (tomcat) with the following transformations:
- Unmodified JSON - no change on the server side
- Minimized JSON - remove whitespaces and new lines (most basic js optimization)
- Compressed JSON using CJSON algorithm
- Compressed JSON using HPack algorithm
- Gzipped JSON - no change on the server side
- Gzipped and minimized JSON
- Gzipped and compressed using CJSON algorithm
- Gzipped and compressed using HPack algorithm
Results
This table contains the results of the benchmark. Each row of the table contains one of the earlier mentioned transformation. The table has 5 columns, one for each JSON file we process.
|
json1 |
json2 |
json3 |
json4 |
json5 |
Original JSON size (bytes) |
52966 |
104370 |
233012 |
493589 |
1014099 |
Minimized |
33322 |
80657 |
180319 |
382396 |
776135 |
Compress CJSON |
24899 |
48605 |
108983 |
231760 |
471230 |
Compress HPack |
5727 |
10781 |
23162 |
49099 |
99575 |
Gzipped |
2929 |
5374 |
11224 |
23167 |
43550 | \
Gzipped and Minimized |
2775 |
5035 |
10411 |
21319 |
42083 |
Gzipped and compressed with CJSON |
2568 |
4605 |
9397 |
19055 |
37597 |
Gzipped and compressed with HPack |
1982 |
3493 |
6981 |
13998 |
27358 |
Relative size of transformations(%)
The relative size of transformation graphic is useful to see if the size of the json to compress affects the efficiency of compression or minimization. You can notice the following:
- the minimization is much more efficient for smaller files. (~60%)
- for large and very large json files, the minimization has constant efficiency (~75%)
- compressors algorithms has the same efficency for any size of json file
- CJson compressing algorithm is less efficient (~45%) than hpack algorithm (~8%)
- CJson compressing algorithm is slower than hpack algorihtm
- Gzipped content has almost the same size as the compressed content
- Combining compression with gzip or minimization with gzip, doesn't improve significantly efficiency (only about 1-2%)
Conclusion
Both JSON compression algorithms are supported by wro4j since version 1.3.8 by the following processors: CJsonProcessor & JsonHPackProcessor. Both of them provide the following methods: pack & unpack. The underlying implementation uses Rhino engine to run the javascript code on the serverside.
JSON Compression algorithms considerably reduce json file size. There a several compression algorithms. We have covered two of them: CJson and HPack. HPack seems to be much more efficient than CJson and also significantly faster. When two entities exchange JSON and the source compress it before it reach the target, the client (target) have to apply the inverse operation of compression (unpacking), otherwise the JSON cannot be used. This introduce a small overhead which must be taken into account when deciding if JSON compression should be used or not.
When gziping of content is allowed, it has a better efficiency than any other compression algorithm. In conclusion, it doesn't worth to compress a JSON on the server if the client accept the gzipped content. The compression on the server-side does make sense when the client doesn't know how to work with gzipped content and it is important to keep the traffic volue as low as possible (due to cost and time).
Another use-case for JSON compression algorithm is sending a large JSON content from client to server (which is sent ungzipped). In this case, it is important to unpack the JSON content on the server before consuming it.