I have a prod environment of Composer 3 Airflow 2.9.3-build.12, everything works fine except for the update of Dags. When creating a new Dag, upload it in the Dag's folder it takes just a few seconds to see it in the Airflow UI. Whenever I update an existing Dag I have to wait from 7 to 9 minutes to actually see the changes. I tried to disable the Dag before uploading the changes, and then re-enable it but there is no improvement.
I have 100 dags, with hundreds of Connections and Variables. My resources are set up as follows:
Workloads configuration
Scheduler: 3 schedulers with 2 vCPU, 4 GB memory, 5 GB storage each
DAG processor: 1 DAG processor with 2 vCPU, 7.5 GB memory, 5 GB storage
Triggerer: 1 triggerer with 0.5 vCPU, 1 GB memory, 1 GB storage
Web server: 1 vCPU, 7.5 GB memory, 5 GB storage
Worker: Autoscaling between 2 and 6 workers, with 2 vCPU, 7.5 GB memory, 20 GB storage each
Meanwhile, in the Airflow Configuration Overrides I set:
core.parallelism : 51
celery.worker_concurrency : 17
scheduler.dag_dir_list_interval : 300
scheduler.min_file_process_interval : 30
scheduler.scheduler_heartbeat_sec : 10
scheduler.file_parsing_sort_mode : modified_time
I'm trying to tune the Scheduler, which had 1 Scheduler and now has 3, but actually not a single change seems to have effect on the issue. I'd like a clarification on how to investigate the Logs properly, too.
I came across many Airflow's docs, which merely explain the params definition, and some guide (this is one which helped me most, at least to understand a few things on the resources).
Any help how to avoid waiting from 7 to 10 minutes everytime I need to update one of the 100 Dags? Thank you in advance.
Solved it, temporarily, by using the param use_cache for the Composer instance.
We have many dags and each dag uses many Variables: this results in the Composer instance to re-parse each dag with all the variables associated.
Really bad legacy pattern.
Waiting for the right time to change this structure, the use_cache param set to True makes parsing faster and the propagation of the Variables' changes slower - fine by me!
The parsing time dropped from almost 10 minutes to 10 seconds, no jokes.