Airflow’s Scheduler is a common source of confusion. If you’re new to Airflow and find yourself wondering, “Why is my DAG always 1 run behind?” - that’s actually expected behavior.
Airflow Documentation
Here’s a blip from Airflow’s documentation on scheduling -
Note that if you run a DAG on a schedule_interval of one day, the run stamped
2016-01-01 will trigger soon after 2016-01-01T23:59. In other words, the job
instance is started once the period it covers has ended.
Let's Repeat That: The scheduler runs your job one schedule_interval AFTER
the start date, at the END of the period.
Let’s say you want your DAG to run daily at 23:00 UTC.
- Your schedule would be:
0 23 * * *
(check out crontab.guru for help decoding schedules) - Today is 7-29 at 23:30 UTC, but the latest run you see is 7-28 at 23:00, not 7-29
That’s expected. Your 7-27 run will actually run on 7-28 (yesterday) at 23:00. The run for data from 7-28 will actually run today (7-29) at 23:00 and be listed as such, and so on and so forth.
Why?
The reasoning here is that if you want a daily job to run on that 11pm slot, Airflow can’t ensure that all the data will be there until 11pm the next day, a full cycle later. In other words, you seeing the latest run be “1 cycle behind” is intentional on Airflow’s part.
Can I trigger a run at the literal time?
It’s a bit of an anti-pattern, but if you do want to adjust that timing you should be able to use the {{ prev_execution_date }}
variables to make sure you’re pulling at the literal time (e.g. 11pm at 11pm that day). https://airflow.apache.org/code.html#default-variables
Of course, you can always manually trigger a single DAG run by hitting the “Play” button in the Airflow UI. That’ll force your DAG to kick off immediately.