javascriptnode.jsmultithreadingnode-worker-threads

Managing multiple long-running tasks concurrently in JS (Node.js)


Golang developer here, trying to learn JS (Node.js).

I'm used to working with goroutines in Go, which for the sake of simplicity let's assume are just threads (actually they're not exactly threads, more like Green Threads, but bear with me!).

Imagine now that I want to create some kind of service that can run some endlessTask which, for example, could be a function that receives data from a websocket and keeps an internal state updated, which can be queried later on. Now, I want to be able to serve multiple users at the same time and each of them can also stop their specific ongoing task at some point. In Go, I could just spawn a goroutine for my endlessTask, store some kind of session in the request dispatcher to keep track to which user each task belongs.

How can I implement something like this in JS? I looked through Node.js API documentation and I found some interesting things:

I'm not sure how I could handle this scenario without multi-threading or multi-processing. Would the worker threads solution be viable in this case?

Any input or suggestion would be appreciated. Thanks!


Solution

  • Imagine now that I want to create some kind of service that can run some endlessTask which, for example, could be a function that receives data from a websocket and keeps an internal state updated

    So, rather than threads, you need to be thinking in terms of events and event handlers since that's the core of the nodejs architecture, particularly for I/O. So, if you want to be able to read incoming webSocket data and update some internal state when it arrives, all you do is set up an event handler for the incoming webSocket data. That event handler will then get called any time there's data waiting to be read and the interpreter is back to the event loop.

    You don't have to create any thread structure for that or any type of loop or anything like that. Just add the right event handler and let it call you when there's incoming data available.

    Now, I want to be able to serve multiple users at the same time and each of them can also stop their specific ongoing task at some point.

    Just add an event listener to each webSocket and your nodejs server will easily serve multiple users. When the user disconnects their webSocket, the listener automatically goes away with it. There's nothing else to do or cleanup in that regard unless you want to update the internal state, in which case you can also listen for the disconnect event.

    In Go, I could just spawn a goroutine for my endlessTask, store some kind of session in the request dispatcher to keep track to which user each task belongs.

    I don't know goroutines but there are lots of options for storing the user state. If it's just info that you need to be able to get to when you already have the webSocket and don't need it to persist beyond that, then you can just add the state directly to the webSocket object. That object will be available anytime you get a webSocket event so you can always have it there to update when there's incoming data. You can also put the state other places (a database, Map object that's indexed by socket or by username of by whatever you need to be able to look it up by) - it really depends what exactly the state is.

    I'm not sure how I could handle this scenario without multi-threading or multi-processing. Would the worker threads solution be viable in this case?

    What you have described doesn't sound like anything that would require clustering, child processes or worker threads unless something you're doing with the data is CPU intensive. Just using event listeners for incoming data on each webSocket will let nodejs' very efficient and asynchronous I/O handling kick into gear. This is one of the things it is best at.

    Keep in mind that I/O in nodejs may be a little inside-out from one what you're used to. You don't create a blocking read loop waiting for incoming data on the webSocket. Instead, you just set up an event listener for incoming data and it will call you when incoming data is available.


    The time you would involve clustering, child processes or Worker Threads are when you have more CPU processing in your Javascript to process the incoming data than a single core can handle. I would only go there if/when you've proven you have a scalability issue with the CPU usage in your nodejs server. Then, you'd want to pursue an archicture that adds just a few other processes or threads to share the load (not one per connection). If you have specific CPU heavy processes (custom encryption or compresssion are classic examples), then it you may help to create a few other processes or Worker Threads that just handle a work queue for the CPU-heavy work. Or if it's just increasing the overall CPU cycles available to process incoming data, then you would probably go to clustering and just let each incoming webSocket get assigned to a cluster and still use the same event handling logic previously described, but now you have the webSockets split across several processes so you have more CPU to throw at them.