pythonpysparkairflow

How to get the JobID for the airflow dag runs?


When we execute a dagrun using the Airflow UI, the "Graph View" displays detailed information about each job run.

JobID is a unique identifier for each job run, typically formatted as scheduled__YYYY-MM-DDTHH:mm:ss.

I require this JobID for tracking and logging purposes, where I meticulously record the time taken for each task and dagrun.

Therefore, my question is: How can I obtain the JobID within the same dag that is currently being executed?


Solution

  • This value is actually called run_id and can be accessed via the context or macros.

    In the python operator this is accessed via context, and in the bash operator this is accessed via jinja templating on the bash_command field.

    More info on what's available in macros:

    https://airflow.apache.org/docs/stable/macros.html

    More info on jinja:

    https://airflow.apache.org/docs/stable/concepts.html#jinja-templating

    from airflow.models import DAG
    from datetime import datetime
    from airflow.operators.bash_operator import BashOperator
    from airflow.operators.python_operator import PythonOperator
    
    
    dag = DAG(
        dag_id='run_id',
        schedule_interval=None,
        start_date=datetime(2017, 2, 26)
    )
    
    def my_func(**kwargs):
        context = kwargs
        print(context['dag_run'].run_id)
    
    t1 = PythonOperator(
        task_id='python_run_id',
        python_callable=my_func,
        provide_context=True,
        dag=dag
        )
    
    t2 = BashOperator(
        task_id='bash_run_id',
        bash_command='echo {{run_id}}',
        dag=dag)
    
    t1.set_downstream(t2)
    

    Use this dag as an example, and check the log for each operator, you should see the run_id printed in the log.