node.jsshopifyshopify-appjsonlines

How to read JSONL line-by-line after hitting url in Node.JS?


From the Shopify API, I receive a link to a large amount of JSONL. Using NodeJS, I need to read this data line-by-line, as loading it all at once would use lots of memory. When I hit the JSONL url from the web browser, it automatically downloads the JSONL file to my downloads folder.

Example of JSONL:

{"id":"gid:\/\/shopify\/Customer\/6478758936817","firstName":"Joe"}
{"id":"gid:\/\/shopify\/Order\/5044232028401","name":"#1001","createdAt":"2022-09-16T16:30:50Z","__parentId":"gid:\/\/shopify\/Customer\/6478758936817"}
{"id":"gid:\/\/shopify\/Order\/5044244480241","name":"#1003","createdAt":"2022-09-16T16:37:27Z","__parentId":"gid:\/\/shopify\/Customer\/6478758936817"}
{"id":"gid:\/\/shopify\/Order\/5057425703153","name":"#1006","createdAt":"2022-09-27T17:24:39Z","__parentId":"gid:\/\/shopify\/Customer\/6478758936817"}
{"id":"gid:\/\/shopify\/Customer\/6478771093745","firstName":"John"}
{"id":"gid:\/\/shopify\/Customer\/6478771126513","firstName":"Jane"}

I'm unsure how to process this data in NodeJS. Do I need to hit the url, download all of the data and store it in a temporary file, then process the data line-by-line? Or can I read the data line-by-line directly after hitting the url (via some sort of stream?) and process it without storing it in a temporary file on the server?

(The JSONL comes from https://storage.googleapis.com/ if that helps.)

Thanks.


Solution

  • using axios you can set the response to be a stream, and then using a buildin readline module, you can process your data line by line.

    import axios from 'axios'
    import { createInterface } from 'node:readline'
    
    const response = await axios.get('https://raw.githubusercontent.com/zaibacu/thesaurus/master/en_thesaurus.jsonl', {
      responseType: 'stream'
    })
    
    const rl = createInterface({
      input: response.data
    })
    
    for await (const line of rl) {
      // do something with the current line
      const { word, synonyms } = JSON.parse(line)
      console.log('word, synonyms: ', word, synonyms);
    }
    

    testing this there is barely any memory usage