I find the concept of start date as little confusing so created a doc for my team to get familiar with it. The terminologies used here is/may not be 100% correct, but it may give an idea to get started and understand the concept
- start_date = The first dag start time. keep it STATIC
- execution_date = max(start_date, last_run_date)
- schedule_interval parameter accepts cron or timedelta values
- next_dag_start_date = execution_date + schedule_interval
- On Home Page, Last Run is execution_date. Hoover over on ( i ) to see the actual last run time
It is always advisable to use a STATIC start_date in a dag
Eg: āstart_dateā: datetime(2019, 10, 13, 15, 50)
You can use - airflow.utils.dates.days_ago(7) but it is not advisable and may cause issues as the dag gets confused at 00:00 and switch to next day incorrectly
schedule_interval parameter accepts cron or timedelta values. This initiates the next dag run by utilizing the formula
next_dag_start_date = max(start_date, last_run_date) + schedule_interval
Eg - if your start_date = datetime(2019, 10, 13, 15, 50), schedule_interval = 0 * * * * or (@hourly)
Case a) current_time is before start_date - 2019-10-13 00:00, then your dags will schedule at
2019-10-13 16:50, and subsequently every hour.
Please note that it will not start at start_date(2019-10-13 15:50), but rather at execution_date + schedule_interval
Case b) current_time is after start_date - 2019-10-14 00:00, then your dags will schedule at
2019-10-13 16:50, 2019-10-13 17:50, 2019-10-13 18:50 ā¦ and subsequently catchup till it reaches 2019-10-13 23:50
Then it will wait for the strike of 2019-10-14 00:50 for the next run.
Please not that the catchup can be avoided by setting catchup=False in dag properties