Airflow start date concepts

I find the concept of start date as little confusing so created a doc for my team to get familiar with it. The terminologies used here is/may not be 100% correct, but it may give an idea to get started and understand the concept

  • start_date = The first dag start time. keep it STATIC
  • execution_date = max(start_date, last_run_date)
  • schedule_interval parameter accepts cron or timedelta values
  • next_dag_start_date = execution_date + schedule_interval
  • On Home Page, Last Run is execution_date. Hoover over on ( i ) to see the actual last run time

It is always advisable to use a STATIC start_date in a dag
Eg: ‘start_date’: datetime(2019, 10, 13, 15, 50)
You can use - airflow.utils.dates.days_ago(7) but it is not advisable and may cause issues as the dag gets confused at 00:00 and switch to next day incorrectly

schedule_interval parameter accepts cron or timedelta values. This initiates the next dag run by utilizing the formula
next_dag_start_date = max(start_date, last_run_date) + schedule_interval

Eg - if your start_date = datetime(2019, 10, 13, 15, 50), schedule_interval = 0 * * * * or (@hourly)

Case a) current_time is before start_date - 2019-10-13 00:00, then your dags will schedule at
2019-10-13 16:50, and subsequently every hour.
Please note that it will not start at start_date(2019-10-13 15:50), but rather at execution_date + schedule_interval

Case b) current_time is after start_date - 2019-10-14 00:00, then your dags will schedule at
2019-10-13 16:50, 2019-10-13 17:50, 2019-10-13 18:50 … and subsequently catchup till it reaches 2019-10-13 23:50
Then it will wait for the strike of 2019-10-14 00:50 for the next run.
Please not that the catchup can be avoided by setting catchup=False in dag properties

1 Like

Hey, thanks for the summary here .This is great. Just to add about the execution_date, Airflow runs DAGs a the the end of the scheudule interval. so for a DAG with an hourly schedule starting at 8am, it will run the first DAG at 9am… and the execution_date of that DAG Run will be 8am. So at 9am, the 8am DAG Run is triggered. You can think of it as “at 9am, i’m ready to process the 8am data… so run the workflow with a data date of 8am”. Hope that helps!

Thanks for this, @sohiljain! Really helpful.

Related post here to @AndrewHarmon’s comment for anyone following: Airflow Pro-Tip: Scheduler will run your job one schedule_interval AFTER the start date