mongodbbson

Create array of objects from bsondump exported json


I used bsondump to export a huge (69GB) file to json. I expected to get a valid json array, but instead the objects are not separated.

There is an option to create a json array using mongoexport. But this bson file was exported from another machine, and due to size and performance considerations I do not want to import this large file before I can use mongoexport to export it from the db instead.

How can I export a valid json array using bsondump?

EDIT

To give more background why I need to convert from a bson based mongodb export to json:

1) I was trying to use mongoexport to export a json directly from mongodb. Just like this:

mongoexport -d mydb -c notifications --jsonArray -o lv.json

The problem with this is that there is no progress available for the export, and it runs significantly slower than mongodump (e.g. it never finished before I had to stop). I'm putting significant strain on a production server. As I stated in my original question, it's not an option for that reason.

2) mongodump works way faster, likely because it doesn't have to convert to json and just dumps the internal data. It also showed progress, so I knew when it would finish. So that's the only thing I could run on the production server.

mongodump --db mydb

Edit 2

After exporting to .bson it is then possible to use bsondump to convert the .bson file into a .json file:

bsondump mydata.bson > mydata.json

To make the point clear here: bsondump has no --jsonArray option like mongoexport. So it cannot export a valid json array, but instead dumps multpiple root objects into one file. The result is an invalid document, which one would have to pre-parse.

/Edit2

3) I have basically two options: Importing the bson dump into a local db, and exporting it to a proper json file using mongoexport --jsonArray. Or find a way around bsondump itself not being able to export to a proper json array file. The third option, implementing a bson parser into my tool, is something that I'm not really keen off...

The large file size is not a problem for my tool. My tool is written in C++ and specialized for large data streams. I use rapidjson with a SAX parser under the hood, and filter out records via an own SQL-like evaluator. Memory usage is in the area of < 10MB usually since I use a SAX parser instead of DOM.


Solution

  • To answer my own question: bsondump is currently missing the option to create a json array as output (like mongoexport's --jsonArray option). I've created a feature request [1] and maybe it will be added to a next version of bsondump.

    Meanwhile, I've created a small tool for my purpose which converts my data into a json array.

    [1] https://jira.mongodb.org/browse/TOOLS-1734