airflowairflow-2.xairflow-webserver

Airflow UI failing to fetch same-task different-retry logs from different nodes in cluster


I kind-of "inherited" a project that uses Airflow 2.2.4 installed on a cluster of several nodes (meaning that I wasn't part of the deployment decisions and configurations and I might not be aware of some under-the-hood processes). Each node runs a scheduler, a CeleryExecutor and a webserver. Task logging is done locally on the nodes' file system. However there must be some misconfiguration somewhere and I can't figure it out. Here is what I have observed:

Example of UI error message:

*** Log file does not exist: [install_path]/airflow/logs/start_acquisition/run_writegofile/2022-07-18T01:00:00+00:00/1.log
*** Fetching from: http://nodeb.mycompany.com:19793/log/start_acquisition/run_writegofile/2022-07-18T01:00:00+00:00/1.log
*** Failed to fetch log file from worker. Client error '404 NOT FOUND' for url 'http://nodeb.mycompany.com:19793/log/start_acquisition/run_writegofile/2022-07-18T01:00:00+00:00/1.log'
For more information check: https://httpstatuses.com/404

Example of correct log fetching message:

*** Log file does not exist: [install_path]/airflow/logs/start_msci_acquisition/run_writegofile/2022-07-18T01:00:00+00:00/2.log
*** Fetching from: http://nodeb.mycompany.com:19793/log/start_acquisition/run_writegofile/2022-07-18T01:00:00+00:00/2.log

Sorry I had to mask out some sensitive info. More than happy to provide more details about the configuration or else, not sure what can be useful here.


Solution

  • Found out this is a known issue with cluster deployment and local storage of task logs: https://github.com/apache/airflow/pull/23178

    Unfortunately it doesn't seem anyone is actively merging this PR.