I'm trying to build a REST API using Falcon. I will have a post
endpoint where i'll receive a json (sometimes with hundreds of keys) and i will attempt to process this data:
The load is less than 50 requests/second but i have requests which can take minutes to process because of the number of keys and the calls to the other APIs (one request can trigger 100 x 3 api calls and the same amount of inserts into the database / elastic)
Given these circumstances, falcon fails to resolve most of my requests. I'm using gunicorn to serve the app, also tried serving it with serve.forever
. I've tried using gevent
but with no success.
I must mention that this service runs in a docker.
Is there any scaling setting in falcon i'm missing or is it a design flaw on my part?
Just to clarify, scaling is normally governed by the application server; as a WSGI/ASGI application framework, Falcon provides a callable application that renders a response to the provided request according to the WSGI (or ASGI) protocol spec. As such, Falcon has, itself, no scaling settings.
You need to determine whether your service is CPU bound or I/O bound (see also: What do the terms "CPU bound" and "I/O bound" mean?). In the case it is truly CPU bound, you may need to add more processing power to your server(s). If I/O bound (waiting for other APIs and DBs), you may be able to address the problem waiting for that I/O in parallel:
Note that a large number of processes and threads might have a considerable scheduling overhead. Furthermore, threads incur an extra performance penalty in Python due to the GIL (What is the global interpreter lock (GIL) in CPython?).
Asynchronous event loops such as Gevent and asyncio usually do better at massively parallel I/O; however, then you need to make sure your I/O (such as API and DB) calls are compatible with the async technology of choice.
Another thing to watch out for is timeout settings across application servers, load balancers, reverse proxies etc. For instance, Gunicorn's default worker timeout is 30 seconds, which means, if using the default sync
worker class, requests longer than 30s would simply never get a chance to complete.