I am trying to build an application that contains an instant messaging module, and one of the main challenges is to keep the application scalable whatever the number of the users or the messages that are exchanged is.
In an article I read that it is possible to build real time applications using GraphQL with “subscriptions”, and in addition to that, it is a simple to use protocole and has the advantage of minimizing roundtrip objects retrievals, and hence less resources use.
But what if we need to add a new server/node to the system in order to scale horizontally? Is this possible using GraphQL?
Taking an example of websockets implementation that allows horizontal scaling, there is SocketCluster. I wonder if an application that is developed by GraphQL alone can be scalable across multiple nodes/machines or it must be used with another framework like SocketCluster in order to achieve this end.
Shortly - yes. We have done it, and it works pretty well.
The trick is, you have to think deeper than just an API worker applications when it comes to horizontal scaling. If you want push architecture, it needs to be asynchronous from the very beginning.
To achieve it, we used queueing systems, namely RabbitMQ.
Imagine this scenario of report generation, which can take up to 10 minutes:
ReportGenerationDoneEvent
and check if anybody is listening for its token.ReportGenerationDoneEvent
.It is quite a bit extensive, but with simple abstractions, you do not have to think about all this complexity and write ~30 lines of code across several services for a new process using this route.
And what is brilliant about it, you end up with nice horizontal scaling, event replayability (retries), separation of concerns (client, api, workers), push out the data as quickly as possible to the client, and as you mentioned you do not waste bandwidth on the are we done yet?
requests.
Another cool thing is, that whenever the user opens reports list within our panel, he sees currently generating reports, and can subscribe to their changes, so they do not have to refresh the list manually.
Good thinking on the SocketCluster. It would optimize step 10 in above scenario, but for now, we do not see any performance issues with broadcasting the ReportGenerationDoneEvent
to the whole API cluster. With more instances or multi-region architecture, it would be a must, as it would allow for better scaling and sharding.
It is important to understand that SocketCluster operates on the layer of communication (WebSockets), but the logical API layer (GraphQL) is above that. To make a GraphQL Subscription, you just have to use a communication protocol that allows you to push information to the user, and WebSockets allow that.
I think using SocketCluster is a good design choice, but remember to iterate with implementation. Only use SocketCluster when you plan to have many sockets open at any single point in time. Also, you should subscribe only when necessary, because WebSocket is stateful and requires management and heartbeats.
If you are further interested in asynchronous backend architecture I used above, read up on CQRS and Event Sourcing patterns.