node.jslarge-filesjsonstream

Best way to read a large JSON file


I currently have a 700M file and always end up with a Memory Limit when I try to read it (purpose: import data to FireStore using firestore nodejs sdk).

I tried the following libraries:


  return fs.createReadStream(file)
    .pipe(parser())
    .pipe(streamArray())
    .on('data', async (row) => {
    //   delete row.key;
      if(row.value && typeof row.value === 'object') {
        ++totalSetCount;

      }
    })
    .on('end', async () => {
      // Final Batch commit and completion message.
      // await batchCommit(false);
      console.log(args.dryRun
        ? 'Dry-Run complete, Firestore was not updated.'
        : 'Import success, Firestore updated!'
      );
      console.log(`Total documents written: ${totalSetCount}`);
    });
}

Here is my error:

<--- Last few GCs --->

[63298:0x102682000]    66318 ms: Mark-sweep 1365.8 (1441.3) -> 1353.1 (1441.8) MB, 470.6 / 0.0 ms  (average mu = 0.212, current mu = 0.069) allocation failure scavenge might not succeed
[63298:0x102682000]    66796 ms: Mark-sweep 1366.4 (1442.3) -> 1352.1 (1443.3) MB, 446.4 / 0.0 ms  (average mu = 0.152, current mu = 0.065) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0xd54cf6dbe3d]
Security context: 0x364a2419e6e1 <JSObject>
    1: exec [0x364a24189231](this=0x364a321029a1 <JSRegExp <String[50]: [^\"\\]{1,256}|\\[bfnrt\"\\\/]|\\u[\da-fA-F]{4}|\">>,0x364aa7402201 <Very long string[65536]>)
    2: _processInput [0x364a32102a09] [/Users/mac-clement/Documents/projets/dpas/gcp/import-data/json-import/node_modules/stream-json/Parser.js:~107] [pc=0xd54cf9bb37b](this=0x364ac032ea19 <Tran...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003b125 node::Abort() [/usr/local/bin/node]
 2: 0x10003b32f node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x1001a8e85 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0x1005742a2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 5: 0x100576d75 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/usr/local/bin/node]
 6: 0x100572c1f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0x100570df4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 8: 0x10057d68c v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0x10057d70f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0x10054d054 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [/usr/local/bin/node]
11: 0x1007d4f24 v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0xd54cf6dbe3d
[1]    63298 abort      firestore-migrator i /Users/mac-clement/Downloads/wetransfer-ff44eb/5000.json

If you have any advice, I'd appreciate it.


Solution

  • You probably should use SAX strategy and read the file piece by piece. The DOM strategy means you decoding entire JSON file into the tree structure. When you using SAX strategy, you having an event to get each separated value and it's key it to do anything with it.