Code Craft

Software is equal parts Art, Craft, and Engineering

A Compact Pull Parser for JSON

This is another situation where I needed something leaner than what I could find on the Internet, and I found myself lying awake in bed early one morning wondering, “how hard could it be?”. Written from about 3 am to 10 am, this parser is efficient, lean and functional.

2008-02-21: Enhance to support the deviations from strict JSON syntax as options, which must be explicitly specified when the parser is constructed. This updated version also uses a nested Escape class to provide coded exceptions which are specific to the failure, for more precise error handling.
2010-07-22: Enhance to support additional options. This update also enhances the way the parser is initialized, moving the setting of the input source to separate and resuable methods with a fluid API design. This makes the parser very simple to use with multiple input sources.
2010-08-16: Fix to properly handle unnamed top-level objects; previously these objects were being treated as arrays of the preceding object. These objects are now delivered by the parser with a blank member name, and it is up to the user of the parser to appropriately act upon them. The parseObject() method example, below, has been modified to only parse the next top-level object from the stream into the supplied target object. Some typos in exceptions and JavaDoc comments were also corrected.
2012-05-14: Fix to consider all comment support to be an optional feature. I mistakingly assumed that the JSON spec allows comments.
2012-12-11: Add option to limit the characters read from an input source; useful for parsing individual JSON messages from messaging input sources like an HTTP request. Note: This feature is flawed; it counts characters, not bytes, due to the Reader wrappering, effectively making it useless for variably-encoded byte streams like UTF-8. Instead the input stream should be wrapped in a limiting filter, being careful to place this underneath any buffering filter if the underlying stream is be used for further reading.
2013-05-25: Fix recording of event location for objects and arrays.
2013-08-29: Remove the flawed input limit feature (setInputLimit()) and modified setInput() functions to cascade to the Reader overload to eliminate some duplicate code.
2014-02-07: Fix bug in multiline comment parsing which failed to detect the comment close when immediately preceded by an odd number of asterisks (e.g **/, ****/ etc).
2014-08-25: Fix parser to ignore BOM instead of changing it to a \u200B.

Relaxation of the JSON Syntax

I mildly dislike a few aspects of the JSON specification (except for the requirement to quote keywords, which I detest), so my parser accepts some relaxations. Note that this still conforms to the specification in spirit, which specifically allows parser to accept looser syntax. This is primarily because I use JSON heavily for configuration files, not just data interchange, and these exceptions allow the configurations to be far more readable and workable for a human being.

The optional behaviors are:

  • OPT_UNQUOTED_KEYWORDS: Allow keyword strings to be unquoted.
  • OPT_EOL_IS_COMMA: Allow an end-of-line to be treated as a comma.
  • OPT_COMMENTS: Allow comments using /*...*/ and // anywhere, and * or # at the start of a line.
  • OPT_MULTILINE_STRINGS: Allow mutiline strings - this permits strings to be broken over multiple lines.
  • OPT_SINGLE_QUOTE_STRINGS: Allow single-quotes to be used for strings.

The parser has additional options, not related to JSON compliance, which are described in the JavaDoc.

Unquoted Names

I find the requirement for quoting names to be draconian and downright ugly. It kills readability, and it also totally defeats a big benefit of syntax highlighting when manually editing the JSON text. Furthermore, it seems that in general the programming community agrees that the quoting of keywords is an unfortunate side-effect of deriving JSON from the JavaScript ECMA specification.

Implied Commas

In my opinion, requiring a comma at the end of line is just adding “noise”.

Comments

The parser allows single line comments starting with *, # and //, and non-nested multiline comments using /*...*/. I use an empty multi-line comment (e.g /**/) to mark temporary changes.

Comments, especially multi-line comments, are a convenience for human-maintained configuration files and for marking temporary changes made by a human editor. There are not necessary and are not permitted for data interchange, but can be quite useful in complex configuration files.

Multiline Strings

This one is “pushing the friendship” just a little. Again, only for human-edited configuration files, it should be enabled only if the JSON data truly exists in a limited/closed context. That said, I have seen this used to great benefit to allow long and complex SQL statements to span lines so as to render them in a way that reflected their structure.

Single Quote Strings

This was added to meet an internal need for some specific configuration files. Among other things, these can be used to denote a character value, and validated to enforce a resulting string of exactly length 1.

Compare strict JSON with the Relaxed JSON which Follows It

Note that the relaxed example shows an additional departure from the JSON specification and assumes repeated keywords are array elements. This is not an option in the parser because whether a repeated keyword does this or generates an error is determined by the code which uses this parser. I typically use a layer on top of the parser for reading documents that treats repeated keywords as if they represent array elements, which vastly improves some complex arrays.

"ProcessingUnit": {
  "Name": "FunctionKeyHotSpots",

  "Condition": { "Key": "Variable#Level.DetectFunctionKey", "Comparator": "GE", "Value": 1.0  },

  "Defaults": [{
    "Name": "FunctionKey_Patterns1",
    "Pattern": [
      "[^Letter,^OpenBracket,BRK]F[(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label]",
      "[^Letter,^OpenBracket,BRK]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label]",
      "[OpenBracket][Space?]F[(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label][^CloseBracket*][$][CloseBracket]",
      "[OpenBracket][Space?]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label][^CloseBracket*][$][CloseBracket]"
      ]
    }, {
    "Name": "FunctionKey_Patterns2",
    "Pattern": [
      "[^Letter,^OpenBracket,BRK]F[(0)?][$Key][Digit{1,2}][$][Space?][Space?][(-:)][$Label]",
      "[^Letter,^OpenBracket,BRK]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label]"
      "[OpenBracket][Space?]F[(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label][^CloseBracket*][$][CloseBracket]",
      "[OpenBracket][Space?]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label][^CloseBracket*][$][CloseBracket]"
      ]
    }, {
    "Name": "FunctionKey_Common",
    "Macro": "CMD([$Key])",
    "Metadata": { "Type": "FunctionKey", "Description": "Press F[$Key]", "Label": "[$Label]", "Value": "[$Key]" }
    }],

  "HotSpot": [{
    "FindText": {
      "InheritDefaults" : "FunctionKey_Patterns1",
      "MinRow"          : "Window.Bottom-2",
      "MaxRow"          : "Window.Bottom+1",
      "TrimStart"       : { "CharSpec": "Space" },
      "TrimEnd"         : { "CharSpec": "Space" },
      "ExpandEnd"       : { "Pattern": [ "  " ], "InheritDefaults": ["FunctionKey_Patterns1"] }
      },
    "InheritDefaults"   : "FunctionKey_Common"
    }, {
    "Condition": { "Key": "Variable#Level.DetectFunctionKey", "Comparator": "GE", "Value": 2 },
    "FindText": {
      "InheritDefaults" : "FunctionKey_Patterns2",
      "MinRow"          : "Window.Bottom-2",
      "MaxRow"          : "Window.Bottom+1",
      "TrimStart"       : { "CharSpec": "Space" },
      "TrimEnd"         : { "CharSpec": "Space" },
      "ExpandEnd"       : { "Pattern": [ "  " ], "InheritDefaults": ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      },
    "InheritDefaults"   : "FunctionKey_Common"
    }, {
    "Condition": { "Key": "Variable#Level.DetectFunctionKey", "Comparator": "GE", "Value": 3 },
    "FindText": {
      "InheritDefaults" : "FunctionKey_Patterns1",
      "MinRow"          : "Window.Top",
      "MaxRow"          : "Window.Bottom-3",
      "TrimStart"       : { "CharSpec": "Space" },
      "TrimEnd"         : { "CharSpec": "Space" },
      "ExpandEnd"       : { "Pattern": [ "  " ], "InheritDefaults": ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      },
    "InheritDefaults" : "FunctionKey_Common"
    }, {
    "Condition": { "Key": "Variable#Level.DetectFunctionKey", "Comparator": "GE", "Value": 4 },
    "FindText": {
      "InheritDefaults" : "FunctionKey_Patterns2",
      "MinRow"          : "Window.Top",
      "MaxRow"          : "Window.Bottom-3",
      "TrimStart"       : { "CharSpec": "Space" },
      "TrimEnd"         : { "CharSpec": "Space" },
      "ExpandEnd"       : { "Pattern": [ "  " ], "InheritDefaults": ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      },
    "InheritDefaults" : "FunctionKey_Common"
    }]
  }


ProcessingUnit: {
  Name: "FunctionKeyHotSpots"

  Condition: { Key: "Variable#Level.DetectFunctionKey", Comparator: "GE", Value: 1    }

  Defaults: {
    Name: "FunctionKey_Patterns1"
    Pattern: [
      "[^Letter,^OpenBracket,BRK]F[(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label]"
      "[^Letter,^OpenBracket,BRK]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label]"
      "[OpenBracket][Space?]F[(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label][^CloseBracket*][$][CloseBracket]"
      "[OpenBracket][Space?]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label][^CloseBracket*][$][CloseBracket]"
      ]
    }

  Defaults: {
    Name: "FunctionKey_Patterns2"
    Pattern: [
      "[^Letter,^OpenBracket,BRK]F[(0)?][$Key][Digit{1,2}][$][Space?][Space?][(-:)][$Label]"
      "[^Letter,^OpenBracket,BRK]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label]"
      "[OpenBracket][Space?]F[(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label][^CloseBracket*][$][CloseBracket]"
      "[OpenBracket][Space?]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label][^CloseBracket*][$][CloseBracket]"
      ]
    }

  Defaults: {
    Name: "FunctionKey_Common"
    Macro: "CMD([$Key])"
    Metadata: { Type: "FunctionKey", Description: "Press F[$Key]", Label: "[$Label]", Value: "[$Key]" }
    }

  HotSpot: {
    FindText: {
      InheritDefaults : "FunctionKey_Patterns1"
      MinRow          : "Window.Bottom-2"
      MaxRow          : "Window.Bottom+1"
      TrimStart       : { CharSpec: "Space" }
      TrimEnd         : { CharSpec: "Space" }
      ExpandEnd       : { Pattern: [ "  " ], InheritDefaults: ["FunctionKey_Patterns1"] }
      }
    InheritDefaults : "FunctionKey_Common"
    }

  HotSpot: {
    Condition: { Key: "Variable#Level.DetectFunctionKey", Comparator: "GE", Value: 2 }
    FindText: {
      InheritDefaults : "FunctionKey_Patterns2"
      MinRow          : "Window.Bottom-2"
      MaxRow          : "Window.Bottom+1"
      TrimStart       : { CharSpec: "Space" }
      TrimEnd         : { CharSpec: "Space" }
      ExpandEnd       : { Pattern: [ "  " ], InheritDefaults: ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      }
    InheritDefaults : "FunctionKey_Common"
    }

  HotSpot: {
    Condition: { Key: "Variable#Level.DetectFunctionKey", Comparator: "GE", Value: 3 }
    FindText: {
      InheritDefaults : "FunctionKey_Patterns1"
      MinRow          : "Window.Top"
      MaxRow          : "Window.Bottom-3"
      TrimStart       : { CharSpec: "Space" }
      TrimEnd         : { CharSpec: "Space" }
      ExpandEnd       : { Pattern: [ "  " ], InheritDefaults: ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      }
    InheritDefaults : "FunctionKey_Common"
    }

  HotSpot: {
    Condition: { Key: "Variable#Level.DetectFunctionKey", Comparator: "GE", Value: 4 }
    FindText: {
      InheritDefaults : "FunctionKey_Patterns2"
      MinRow          : "Window.Top"
      MaxRow          : "Window.Bottom-3"
      TrimStart       : { CharSpec: "Space" }
      TrimEnd         : { CharSpec: "Space" }
      ExpandEnd       : { Pattern: [ "  " ], InheritDefaults: ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      }
    InheritDefaults : "FunctionKey_Common"
    }
  }

How It Works

Essentially, the caller simply calls next() in a loop, with each call returning a parsing event code. For each event the details are queried out of the parser in order to process the parsed data. The parser takes care of all decoding, delivering values as Strings, which the caller may then convert into Java objects and values - note that although a String value is ECMA unescaped, the quotes are left on so that string values can be differentiated from numbers and true, false and null.

The essence of the parser is the next() method, which implements a low-level state machine (the events returned themselves constitute a higher level state machine). Most of the rest of the class is methods for accessing the values for the higher level events.

After an up-front test for comments (my parser allows single line comments starting with *, # and //, and non-nested multiline comments using /* ... */), the parser arrives at the next event by progressing through a keyword, in quotes, divider and value states. Parsing is made considerably simpler by the facts that the JSON syntax is simple and rigorous, the rules are strict and any violation results in an exception.

Two skip methods, one for objects and one for arrays, provide for convenient stream processing of select members.

A very simple layer on top of the pull parser can load a document into a Java multi-map structure - here is an example. This example uses a Callback to actually create any objects, delegating the responsibility back to the calling code, allowing very precise control (a default callback method is also shown). Note that this example turns repeated keywords into an array.

Document Builder Code

/**
 * Parse a generalized data structure from a JSON input stream.
 * <p>
 * All values are added using the <code>crtmbrcbk</code> callback.
 * <p>
 * <b><u>Reminder</b></u>
 * <p>
 * When using a reflected method, don't forget to configure your code obfuscator to retain it in unobfuscated form.
 *
 * @param psr       The parser to use.
 * @param tgt       Target object to which to add members; if this is null a new object is created using the callback.
 * @param maxlvl    Maximum level to recursively parse substructures, including arrays (objects at a deeper level are silently ignored).
 * @param crtmbrcbk A callback object invoked to create a member value.
 */
static public Object parseNextObject(JsonParser psr, Object tgt, int maxlvl, Callback crtmbrcbk) {
    return _parseObject(psr,tgt,maxlvl,new Callback.WithParms(crtmbrcbk,4),false,true);
    }

static private Object _parseObject(JsonParser psr, Object tgt, int maxlvl, Callback.WithParms crtmbrcbk, boolean arr, boolean ignenc) {
    int                                 evt;                                                    // event code

    if(tgt==null) { tgt=crtmbrcbk.invoke(psr,null,"",null); }

    while((evt=psr.next())!=JsonParser.EVT_INPUT_ENDED &amp;&amp; evt!=JsonParser.EVT_OBJECT_ENDED &amp;&amp; evt!=JsonParser.EVT_ARRAY_ENDED) {
        String  nam=psr.getMemberName();

        switch(evt) {
            case JsonParser.EVT_OBJECT_BEGIN : {
                if(nam.length()==0 && ignenc) {
                    continue;
                    }
                ignenc=false;
                if(maxlvl>1) { _parseObject(psr,crtmbrcbk.invoke(psr,tgt,nam,null,Boolean.FALSE),(maxlvl-1),crtmbrcbk,false,false); }
                else         { psr.skipObject();                                                                                    }
                } break;

            case JsonParser.EVT_ARRAY_BEGIN : {
                if(!arr) {
                    _parseObject(psr,tgt,maxlvl,crtmbrcbk,true);                                // first level of any array is added directly to the inherently list-supporting object
                    }
                else {
                    if(maxlvl&gt;1) { _parseObject(psr,crtmbrcbk.invoke(psr,tgt,nam,null),(maxlvl-1),crtmbrcbk,true); }
                    else            { psr.skipArray();                                                                }
                    }
                } break;

            case JsonParser.EVT_OBJECT_MEMBER : {
                crtmbrcbk.invoke(psr,tgt,nam,psr.getMemberValue());
                } break;
            }
        }
    return tgt;
    }

Default Member Creation Code

/**
 * Default callback method for creating a typed value and/or add it to a DataStruct.
 * <p>
 * The rules used to determine type are the standard JSON interpretation of the input value, except that numerics are created as Strings:
 * <ol>
 *   <li>A null value is returned as a new DataStruct().
 *   <li>A quoted value is returned as a String with the quotes stripped.
 *   <li>A value of <i>null</i> is return as JsonUtil.NULL.
 *   <li>A value of <i>true</i> is returned as Boolean.TRUE.
 *   <li>A value of <i>false</i> is returned as Boolean.FALSE.
 *   <li>A value beginning <i>0x</i> is parsed as a hexadecimal value and returned as an Integer.
 *   <li>Any other value is parsed as a double value and returned as a Double.
 *   </ol>
 * <p>
 *
 * @param tgt       The optional target to which to add the new member - if null the member object is created without adding it to anything.
 * @param nam       The name to use to add the new member.
 * @param val       The value of the member to add - must be null or String.
 * @return          The newly created object, which is one of: DataStruct, null, Boolean, or String.
 */
static public Object callback_crtMemberDefault(JsonParser psr, Object tgt, String nam, String val, boolean arr) {
    Object                              ret;

    if     (val==null                    ) { ret=new DataStruct();            }
    else if(JsonParser.isQuoted(val)     ) { ret=JsonParser.stripQuotes(val); }
    else if(val.equalsIgnoreCase(""     )) { ret="";                          }
    else if(val.equalsIgnoreCase("null" )) { ret=NULL;                        }
    else if(val.equalsIgnoreCase("true" )) { ret=Boolean.TRUE;                }
    else if(val.equalsIgnoreCase("false")) { ret=Boolean.FALSE;               }
    else if(val.startsWith("0x")) {
        try { ret=Integer.valueOf(val.substring(2),16); } catch(NumberFormatException thr) { throw new JsonEscape(JsonEscape.INVALID_VALUE,"Invalid hexadecimal number '"+val+"'",thr); }
        }
    else {
        try { ret=Double.valueOf(val);                  } catch(NumberFormatException thr) { throw new JsonEscape(JsonEscape.INVALID_VALUE,"Invalid decimal number '"    +val+"'",thr); }
        }

    if(tgt!=null) {
        if(nam.length()==0) {
            throw new JsonEscape(JsonEscape.MALFORMED,"Object with a blank name is not permitted (this is usually caused by a malformed array or multiple top-level enclosing objects); at "+psr.getInputLocation());
            }
        ((DataStruct)tgt).addField(nam,ret);
        }

    return ret;
    }

Get The Source

The source compiles to Java 5, but only minor changes should be required to target as far back as Java 2.

Download JsonParser.java.