airflow

Airflow - Task date is different than the date in the rendered template


I have an airflow DAG that should run on 1 AM every day. Now from the task details it does seems like it ran on 1 AM on the 24th (the time of this post): enter image description here

But as you can see in the next run on the top right it seems like its same as the task last run time.

And when looking on the run id in the photo above and when looking at the rendered template it seems like airflow thinks the date is the 23th: enter image description here

This is a real issue since we have a one day delay on this job... Does someone knows why something like that can happen?


Solution

  • This is a real issue since we have a one day delay on this job

    It's not a delay, you can check https://stackoverflow.com/a/65196624/14624409 for information why it happens.

    If you DAG is:

    with DAG(dag_id='my_dag', schedule="0 1 * * *", start_date=datetime(2024, 06, 24)):

    Then run of 2024-06-04 01:00 will actually start running in 2024-06-25 at 01:00 because the schedule has 24 hours interval from 01:00 with base date of 2024-06-24.

    If you want the first run to start on 2024-06-24 then you need to define your DAG as:

    with DAG(dag_id='my_dag', schedule="0 1 * * *", start_date=datetime(2024, 06, 23)):

    Note, this behavior consistent with how data pipeline works. Today you are processing yesterday data, in other words today (2024-06-24) you want to process data between window of start_date: 2024-06-23 to end_date 2024-06-24