pythonmultithreadingflask

Correct way to parallelize request processing in Flask


I have a Flask service that receives GET requests, and I want to scale the QPS on that endpoint (on a single machine/container). Should I use a python ThreadPoolExecutor or ProcessPoolExecutor, or something else? The GET request just retrieves small pieces of data from a cache backed by a DB. Is there anything specific to Flask that should be taken into account?


Solution

  • Neither.

    Flask will serve one request per worker (or more, but depending on the worker type) - the way you set-it up, either with gunicorn, wsgi or awsgi is what is responsible for the number of parallel requests your app can process.

    Inside your app, you don't change anything - your views will be called as independent processes, independent threads or independent async tasks, depending on how you setup your server - that is where you have to tinker with the configurations.

    Using another concurrency strategy would only make sense if each request would have calculations and data fetching which could themselves be parallelized, inside the same request.

    Check how is your deployment configured, and pick the best option for you (all things being equal, pick the easiest one): https://flask.palletsprojects.com/en/stable/deploying/ (also, I would not recomend "mod_wsgi" among those options - it is super complex and old tech)