airflowclouderacloudera-cdp

Capture airflow run duration


I have a requirement to gather run duration (time) for the last 3 months, for a particular airflow job.

In our CDE environment we use airflow to call spark DBT jobs, of late the run duration of job have increased drastically. I assume there would be a way to gather this jobs runtime duration for further analysis etc. Hoping for some assistance / guidance in getting his done.

CDP Public Cloud / CDE Version - 1.19.3-b29

Thanks

Capturing details via CDE UI is not feasible hence want to know of other methods to get this required information


Solution

  • Task duration is available in Airflow GUI, in the Grid view, when you click the task name (the row) on the left, as shown in Airflow docs here: https://airflow.apache.org/docs/apache-airflow/stable/ui.html

    enter image description here

    If that is not enough (i.e. you need it for a very long period, or filter out some runs, ...), then you could also go directly to the airflow metadata db, and query the task_instance table - ERD schema of the db here: https://airflow.apache.org/docs/apache-airflow/stable/database-erd-ref.html Be careful if trying this though - make sure you know what you are doing first, to not mess up your airflow. Preferably connect in a readonly session.

    Edit: just noticed you are using Airflow in Cloudera - I don't have experience with that, just with Airflow directly. But if you can access your airflow, then my answer should still apply