I was working across a big dataset of approximately a million items and took a dump from a database using streams. However, I mistakenly missed the opening and closing box brackets to indicate a json array and just pushed the objects without it.
Now the problem is, I want to fix the JSON file for it to be processable by a software, however due to the file size (20.01 GB), I am getting buffer and memory issues (expected due to size). Is there a way to fix this file?
PS: I don't want to run such a big and expensive query again on database for a fresh dump.
Syntax of data in current file:
{ "name": "aaron", age: 21 },{ "name": "jen", age: 26 }
Expected Syntax of data in file:
[{ "name": "aaron", age: 21 },{ "name": "jen", age: 26 }]
If you just need to slap brackets on the front and end, use cat
:
echo '[' > fixed.json
cat broken.json >> fixed.json
echo ']' >> fixed.json
You could do the same thing with Node, obviously, by reading in the file stream, and prefixing and postfixing the output accordingly. Since it sounds like this is a one-off mistake, a quick fix is likely the more appropriate approach.