DAG next run in past date, why?

HI,
I have created a dag scheduled to run each day at 21.

I set the properties in this way:

schedule_interval="0 21 * * *",
    start_date=pendulum.datetime(2022, 1, 1, tz="UTC"),
    catchup=False,

And in default_args I set:

"max_active_runs": 1,

I activated the dag yersterday 2022-05-02 and I saw it was ececuted immediately with date 2022-04-30 and the next run was set on 2022-05-01.
Today I have seen that the next dag run is set to 2022-05-02 but today is 2022-05-03 and there is a run on 2022-05-01, what am I wronging?

Another question, how can I prevent that a dag is executed immediately after the activation? Should I change the start_date?

Thanks a lot!

Hi @abdujaparov

A DAG is triggered once the start_date + schedule_interval has elapsed. For example:

  • start date = Jan 1st, 2021 at 10am
  • schedule_interval = 10 minutes
  • This means that the first run will be triggered on 1/1/2021 at 10:10am (after the start date of 1/1/21 + schedule_interval of 10 mins has elapsed)

So what happens in airflow? At 1/1/2021 at 10am, airflow waits 10 minutes (the schedule_interval). Once 10:10 hits, the DAG run for 10am runs, and 1/1/21 10:00am becomes the execution_date, and 1/1/21 10:10am is the start_date

Here are a few links that can help go into a bit more detail on scheduling:

To prevent a DAG from executing immediately after activation, set the schedule_interval=None
Otherwise, even with catchup=False, the DAG will run the most recently missed schedule interval once it is activated.

Hi,
I read you link, very interesting, but I have some doubts.

If I would a dag that is run each day at 00:00 with a logical data that have a delta of 7 days should I set:

schedule_interval=timedelta(days=7),
start_date=pendulum.datetime(2022, 1, 1, tz="UTC"),

should be right?

And if I would a task that runs each day at 23:00 with logical same of execution date? Is it possible or I have a delay of one day in any case?
Is it possible to have a difference less than 1 day between start date and logical date?

Thanks a lot.

Hi @abdujaparov - Keep in mind that the logical_date is the start of the data interval, not when the DAG is actually executed.
The start_date argument for the DAG marks the start of the DAG’s first data interval, not when tasks in the DAG will start running

This page may be helpful with more detail: https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html?highlight=pass%20data#data-interval

If you were to use schedule_interval=timedelta(days=7), keep in mind that this will run every 7 counted days, not on a particular day of the week (such as every Monday).

Is it possible to have a difference less than 1 day between start date and logical date?

Yes, you can for example have a schedule_interval to run the DAG every hour, for example.

Hi,
@aspain so with timedelta I say “run this DAG each XX yyy” where XX is the quantity yyy the type, each 7 days, each 15 minutes etc, is it correct?
I want to execute a DAG daily with a logical with a delta of 7 days, so I would like:

  • start_date: 2022-05-17 00:00, logical_date: 2022-05-10 00:00
  • start_date: 2022-05-18 00:00, logical_date: 2022-05-11 00:00
    etc…

If I set:

schedule_interval=timedelta(days=7)

I have the dag executed each 7 days, from the start_date, correct?

If I set:

schedule_interval="@daily"

I have the dag executed daily with logical data that is start_date-1 (if the dag is started at 2022-05-17 00:00 the logical date is 2022-05-16 00:00), correct?

No way to have what I would like (each date with the logical date=start_date-7)?

Thanks a lot!

The logical date is tied to your interval, so if you want something to run daily but pull the last seven days of data you will need to adjust the date at the operator level not the DAG level. The logical date will always be the start date of the interval.

Edit: Just realized this post is really old, weird it was at the top of my forum list.