node.jsdatasetnodejs-streamhighland.jsnodejs-server

How to process huge array of objects in nodejs


I want to process array of length around 100 000, without putting too much load on CPU. I researched about streams and stumbled upon highlandjs, but i am unable to make it work.

I also tried using promises and processing in chunks but still it is putting very much load on CPU, program can be slow if needed but should not put load on CPU


Solution

  • With node.js which runs your Javascript as single threaded, if you want your server to be maximally responsive to incoming requests, then you need to remove any CPU intensive code from the main http server process. This means doing CPU intensive work in some other process.

    There are a bunch of different approaches to doing this:

    1. Use the child_process module to launch another nodejs app that is purposeful built for doing your CPU intensive work.
    2. Cluster your app so that you have N different processes that can do both CPU intensive work and handle requests.
    3. Create a work queue and a number of worker processes that will handle the CPU intensive work.
    4. Use the newer Worker Threads to move CPU intensive work to separate node.js threads (requires node v12+ for stable, non-experimental version of threads).

    If you don't do this CPU intensive work very often, then #1 is probably simplest.

    If you need scale for other reasons (like handling lots of incoming requests) and you don't do the CPU intensive stuff very often #2.

    If you do the CPU intensive stuff pretty regularly and you want incoming request processing to always have the highest priority and you're willing to allow the CPU intensive stuff to take longer, then #3 (work queue) or #4 (threads) is probably the best and you can tune the number of workers to optimize your result.