jsonruby

Is there any way to parse JSON with trailing commas in Ruby?


I'm currently coding a transition from a system that used hand-crafted JSON files to one that can automatically generate the JSON files. The old system works; the new system works; what I need to do is transfer data from the old system to the new one.

The JSON files are used by an iOS app to provide functionality, and have never been read by our server software in Ruby On Rails before. To convert between the original system and the new system, I've started work on parsing the existing JSON files.

The problem is that one of my first two sample files has trailing commas in the JSON:

{ "sample data": [1, 2, 3,] }

This apparently went through just fine with the iOS app, because that file has been in use for a while. Now I need some way to parse the data provided in the file in my Ruby on Rails server, which (quite rightfully) throws an exception over the illegal trailing comma in the JSON file.

I can't just JSON.parse the code, because the parser, quite rightfully, rejects it as invalid JSON. Is there some way to parse it -- either an option I can pass to JSON.parse, or a gem that adds something, etc etc? Or do I need to report back that we're going to have to hand-fix the broken files before the automated process can process them?

Edit:

Based on comments and requests, it looks like some additional data is called for. The JSON files in question are stored in .zip files on S3, stored via ActiveStorage. The process I'm writing needs to download, unpack, and parse the zip files, using the 'manifest.json' file as a key to convert the archived file into a database structure with multiple, smaller files stored on S3 instead of a single zip that contains everything. A (very) long term goal is for clients to stop downloading a unitary zip file, and instead download the files individually. The first step towards that is to break the zip files up on the server, which means the server needs to read in the zip files. A more detailed sample of the data follows. (Note that the structure contains several design decisions I later came to regret; one of the original ideas was to be able to re-use files rather than pack multiple copies of the same identical file, but YAGNI bit me in the rear there)

The following includes comments that are not legal in JSON format:

{
  "defined_key": [
    {
      "name": "Object_with_subkeys",
      "key": "filename",
      "subkeys": [
        {
          "id":"1"
        },
        {
          "id":"2"
        },
        {
          "id":"3" // references to identifier on another defined key
        }, // Note trailing comma
      ]
    }
  ],
  "another_defined_key":[
    {
      "identifier": "should have made parent a hash with id as key instead of an array",
      "data":"metadata",
      "display_name":"Names: Can be very arbitrary",
      "user text":"Wait for the right {moment}", // I actually don't expect { or } in the strings, but they're completely legal and may have been used
      "thumbnail":"filename-2.png",
      "video-1":"filename-3.mov"
    }
  ]
}

Solution

  • The problem is that your are trying to parse something that looks a lot like JSON but is not actually JSON as defined by the spec.

    Arrays- An array structure is a pair of square bracket tokens surrounding zero or more values. The values are separated by commas.

    Since you have a trailing comma another value is also expected and most JSON parsers will raise an error due to this violation

    All that being said json-next will parse this appropriately maybe give that a shot.

    It can parse JSON like representations that completely violate the JSON spec depending on the flavor you use. (HanSON, SON, JSONX as defined in the gem)

    Example:

    json = "{ \"sample data\": [1, 2, 3,] }")
    require 'json/next'
    HANSON.parse(json)
    #=> {"sample data"=>[1, 2, 3]}
    

    but the following is equivalent and completely violates spec

    JSONX.parse("{ \"sample data\": [1 2 3] }")
    #=> {"sample data"=>[1, 2, 3]} 
    

    So if you choose this route do not expect to use this to validate the JSON data or structure in any fashion and you could end up with unintended results.