node.jsjsonbigdatafileparsing

How to parse a Really Big Incorrectly Saved JSON file to fix it


I was working across a big dataset of approximately a million items and took a dump from a database using streams. However, I mistakenly missed the opening and closing box brackets to indicate a json array and just pushed the objects without it.

Now the problem is, I want to fix the JSON file for it to be processable by a software, however due to the file size (20.01 GB), I am getting buffer and memory issues (expected due to size). Is there a way to fix this file?

PS: I don't want to run such a big and expensive query again on database for a fresh dump.

Syntax of data in current file:

{ "name": "aaron", age: 21 },{ "name": "jen", age: 26 }

Expected Syntax of data in file:

[{ "name": "aaron", age: 21 },{ "name": "jen", age: 26 }]

Solution

  • If you just need to slap brackets on the front and end, use cat:

    echo '[' > fixed.json
    cat broken.json >> fixed.json
    echo ']' >> fixed.json
    

    You could do the same thing with Node, obviously, by reading in the file stream, and prefixing and postfixing the output accordingly. Since it sounds like this is a one-off mistake, a quick fix is likely the more appropriate approach.