I have a requirement to gather run duration (time) for the last 3 months, for a particular airflow job.
In our CDE environment we use airflow to call spark DBT jobs, of late the run duration of job have increased drastically. I assume there would be a way to gather this jobs runtime duration for further analysis etc. Hoping for some assistance / guidance in getting his done.
CDP Public Cloud / CDE Version - 1.19.3-b29
Thanks
Capturing details via CDE UI is not feasible hence want to know of other methods to get this required information
Task duration is available in Airflow GUI, in the Grid view, when you click the task name (the row) on the left, as shown in Airflow docs here: https://airflow.apache.org/docs/apache-airflow/stable/ui.html
If that is not enough (i.e. you need it for a very long period, or filter out some runs, ...), then you could also go directly to the airflow metadata db, and query the task_instance
table - ERD schema of the db here: https://airflow.apache.org/docs/apache-airflow/stable/database-erd-ref.html
Be careful if trying this though - make sure you know what you are doing first, to not mess up your airflow. Preferably connect in a readonly
session.
Edit: just noticed you are using Airflow in Cloudera - I don't have experience with that, just with Airflow directly. But if you can access your airflow, then my answer should still apply