I’m using Apache Airflow with the LocalExecutor on a machine that has an 8-core CPU.
Let’s say I have a DAG with two tasks that can run in parallel, and each of these tasks uses Python's multiprocessing module to spawn 5 subprocesses internally.
This leads me to some confusion:
That’s effectively 2 tasks × 5 processes = 10 total subprocesses, all trying to run on an 8-core machine. I’ve seen that in the default airflow.cfg, the parallelism value is set to 32. My questions are:
How does Airflow handle this scenario?
If each task internally spawns multiple subprocesses, won’t this lead to more processes than available CPU cores?
Is Airflow aware of the internal multiprocessing done inside the tasks?
Will it take into account those internal subprocesses when managing task concurrency?
Why is parallelism set to 32 by default in airflow.cfg?
On an 8-core machine, wouldn’t that allow way more tasks than my CPU can handle efficiently?.Say I have 32 task to run in parallel in airflow but I have 8 core cpu how is that even possible ?
I'm trying to understand how Airflow balances these configurations with the actual hardware resources available, and how I should approach setting these values in a real-world scenario.
Any clarification or insights would be greatly appreciated. Thanks!
How does Airflow handle this scenario?
Airflow does not know or care about multiprocessing inside your tasks.
When Airflow (using LocalExecutor
) starts a task, it just launches a Python process.
What happens inside that Python process is invisible to Airflow. So if your task uses multiprocessing to spawn 5 child processes, Airflow is completely unaware — it sees it as "one running task."
If each task internally spawns multiple subprocesses, won’t this lead to more processes than available CPU cores?
You can easily end up with more running processes (OS-level) than CPU cores.
The OS (Linux, Windows, whatever) will schedule them — some processes will run, some will wait.
But if all processes are CPU-heavy, oversubscription can cause contention and slowdowns — context switching, CPU thrashing, ecc...
Is Airflow aware of the internal multiprocessing done inside the tasks?
No. Airflow only manages DAGs and tasks — one task = one unit of work
Will it take into account those internal subprocesses when managing task concurrency?
No. Airflow will not consider your task's internal spawning at all. You must design and configure Airflow (and your code) smartly
Why is parallelism set to 32 by default in airflow.cfg?
That's just the default, not a magic number for an 8core machine
In Apache Airflow, when you talk about running 32 tasks in parallel, it is bounded by processes, not threads. So technically you can, because processes are scheduled by the OS, but performance might be not good if the tasks are all CPU-bound.