Airflow Pro-Tip: Scheduler will run your job one schedule_interval AFTER the start date

Airflow’s Scheduler is a common source of confusion. If you’re new to Airflow and find yourself wondering, “Why is my DAG always 1 run behind?” - that’s actually expected behavior.

Airflow Documentation

Here’s a blip from Airflow’s documentation on scheduling -

Note that if you run a DAG on a schedule_interval of one day, the run stamped
2016-01-01 will trigger soon after 2016-01-01T23:59. In other words, the job
instance is started once the period it covers has ended.

Let's Repeat That: The scheduler runs your job one schedule_interval AFTER
the start date, at the END of the period.

Let’s say you want your DAG to run daily at 23:00 UTC.

  • Your schedule would be: 0 23 * * * (check out crontab.guru for help decoding schedules)
  • Today is 7-29 at 23:30 UTC, but the latest run you see is 7-28 at 23:00, not 7-29

That’s expected. Your 7-27 run will actually run on 7-28 (yesterday) at 23:00. The run for data from 7-28 will actually run today (7-29) at 23:00 and be listed as such, and so on and so forth.

Why?

The reasoning here is that if you want a daily job to run on that 11pm slot, Airflow can’t ensure that all the data will be there until 11pm the next day, a full cycle later. In other words, you seeing the latest run be “1 cycle behind” is intentional on Airflow’s part.

Can I trigger a run at the literal time?

It’s a bit of an anti-pattern, but if you do want to adjust that timing you should be able to use the {{ prev_execution_date }} variables to make sure you’re pulling at the literal time (e.g. 11pm at 11pm that day). https://airflow.apache.org/code.html#default-variables

Of course, you can always manually trigger a single DAG run by hitting the “Play” button in the Airflow UI. That’ll force your DAG to kick off immediately.