I have a dashboard built on plotly dash. The dashboard updates in real-time and it includes a lot of processing of independent files. For example, there are five different time series in the dashboard and they could feasibly update separately and in parallel because they are completely independent from one another.
I am hosting the dashboard locally on a windows machine. Based on the commentary in the dash plotly forum, it sounds like the best way to get parallel processing is by using a job cue like waitress or celery.
What is the best tool to use to take advantage of parallel processing?
These are three options:
TLDR; I would recommend using Celery
, here is a small example.
The tools you have listed are used for slightly different purposes,
Waitress
is a WSGI server, i.e. it can be used to serve the Dash application, or more specifically the underlying Flask server
Threading
is a library for building threaded programs in Python
Celery
is a distrubuted task queue
That being said, all of the above tools do provide some functionality related to concurrent processing in Dash,
You can enable concurrent execution of callbacks via configuration of you WSGI server (which could be Waitress
, though gunicorn
is a more popular choice)
From within a callback, you can spin of the heavy part of a calculation to seperate thread(s) using the Threading
library
You can use Celery
to do the (async) heavy lifting
The first option will speed up you app, if you have many independent callbacks running (as they will run in parallel). If you instead have few but slow callbacks (i.e. their execution time is more a few of seconds), a better approach would be to do the heavy lifting asynchronously. While both option 2 and 3 enables async processing, Celery
comes with a lot of functionality out of the box. Hence, for your usecase, Celery
would be my first choice. For reference, here is a small example of how to run an async job in Dash
using Celery
.