I have an Airflow cluster running with Docker. Everything seems to work fine except DAG Processor is not properly bound.
When checking the cluster status it is set to Unknown:
Thus I am investigating how to make the DAG Processor able to get its status right.
I have confirmed components are running fine and do have healthy status:
docker ps | grep airflow
b3945b8cd533 apache/airflow:2.7.1 "/usr/bin/dumb-init …" 3 days ago Up 3 days (healthy) 8080/tcp airflow_airflow-webserver_1
c19eb3571734 apache/airflow:2.7.1 "/usr/bin/dumb-init …" 3 days ago Up 3 days (healthy) 8080/tcp airflow_airflow-worker_1
07403ea2cdd0 apache/airflow:2.7.1 "/usr/bin/dumb-init …" 3 days ago Up 3 days (healthy) 8080/tcp airflow_airflow-scheduler_1
6b53355eae24 apache/airflow:2.7.1 "/usr/bin/dumb-init …" 3 days ago Up 3 days (healthy) 8080/tcp airflow_airflow-triggerer_1
1f8dfbc2766b redis:latest "docker-entrypoint.s…" 4 days ago Up 4 days (healthy) 6379/tcp airflow_airflow-cache_1
53c2926b6eea postgres:16 "docker-entrypoint.s…" 4 days ago Up 4 days (healthy) 5432/tcp airflow_airflow-database_1
Checking the configuration, logs are properly defined:
dag_processor_manager_log_location = /opt/airflow/logs/dag_processor_manager/dag_processor_manager.log
And the file does exist at the required location:
docker exec -it airflow_airflow-worker_1 bash
ls -l /opt/airflow/logs/dag_processor_manager
total 18444
-rw-rw-r-- 1 default root 18880639 Oct 10 06:15 dag_processor_manager.log
Here is my questions: Why my DAG Processor status is Unknown? And, how should I configure the cluster to get this status properly bound?
It doesn't look like you are running a standalone DagProcessorManager component in your environment. As of Airflow 2.3.0, the DagProcessorManager can be run as a separate process to take decouple some responsibility away from the Scheduler.
This configuration is disabled by default, but you can enable it via the standalone-dag-processor
config. Once enabled, the DAG Processor health should display as you'd like from the /health
endpoint.