airflowairflow-schedulerairflow-2.xairflow-apiairflow-webserver

How to schedule execution without delay?


I have one problem, I need to launch one DAG every first day of month, but I have one problem, the DAG started on 1 October but executed that day on 1 November, I need that 1 October execute 1 October and 1 November execute 1 November, and not delay the execution one month.

My scheduler was: '0 10 1 * *'

Thanks


Solution

  • This is how Airflow works. Airflow schedule DAGs at the end of the interval. So if you have:

    DAG(
        dag_id='tutorial',
        schedule_interval='0 10 1 * *',
        start_date=datetime(2021, 10, 1),
    ) 
    

    The first run will start on 2021-11-01 - this run will have execution date of 2021-10-01. This behavior is consistent with how data pipelines work. In November you will want to process October data. Or in the terminology that I mentioned before - Your monthly interval starts on beginning of October it ends in November so at the beginning of November you can run the job that process October data.

    That said - In the job itself you can process any interval you wish. For that you can use Airflow macros.

    In simple words if you want your first DAG run to start on 2021-10-01 you should set start_date=datetime(2021, 9, 1)

    Starting from Airflow 2.2.0 there was enhancement in that area. Airflow decoupled the "When to run" from the "What interval to process" with the completion of AIP- 39 Richer Scheduler. You can read about the concept of Timetables in this doc.