Airflow Scheduler not recognizing CRON with Month and/or Day of the month schedule

We are using version 1.10.12 Airflow. We use a Admin/Variable to set the schedule_interval in the dag. But the CRON only works with minutes and hours placeholders and as soon as we put the third or fourth values i.e. day and month, the dag doesn’t kick off. Here is the code :

default_args = {
‘owner’: ‘airflow’,
‘depends_on_past’: False,
‘start_date’: datetime(2021, 1, 2),
‘email_on_failure’: True,
‘email_on_retry’: False,
‘email’: DLEMAIL,
‘run_as_user’: LOGONUSR
}
dag = DAG(‘dag_hed_hed_xtrfi’,
max_active_runs=1, #This is to run one instance of a Dag
catchup=False,
schedule_interval=RFIHEDHED10,
description = ‘SubBox for Extract-RFI’,
default_args=default_args,
tags=[‘GXY’],
is_paused_upon_creation=False
)
where the value of RFIHEDHED10 is read from admin variable which is = “24 18 13 7 *”. But if we only have RFIHEDHED10 = “24 28 * * *”, then the dag will kick off. So, we are struggling on trying to understand the behavior. Sometimes, with only hours and minutes, the dag will kick off two minutes early. Sometimes, we see two instances getting queued next to each other, so one would be running and the other instance would be waiting in queue for the first one to finish. So, we are unable to understand the behavior of scheduler as it is all over the board and no one is able to explain this strange behavior. Any idea?

With a schedule_interval as “24 18 13 7 *”, you are scheduling a dagrun at 18:24 on July 13th. (see crontab.guru)

You are essentially scheduling the dag once a year. If there are no dagruns for you DAG, Airflow will create the dagrun for the current interval.

Airflow did NOT create a dagrun when you submitted this post is because it wasn’t 18:24 UTC yet at that time and the start date is the beginning of 2021.

The first dagrun would be scheduled at 18:34 July 13th 2021 with execution date 2020-07-13 18:34:00.

Airflow did create a dagrun when you changed the schedule_interval as “24 28 * * *” because the current interval you are in is after the start_date and because there weren’t any dagruns (I’m assuming).
It probably created a dagrun with the execution date 2021-07-11 18:34:00.

I’ve never seen this happen before? Is the start_date of a dagrun before the scheduler interval for that dagrun?

Please provide a example of the scenario you described with the tooltip of the dagrun in the tree view.

When you say instance, do you mean task instances? or dagrun instances?

It could be because of the capacity limited by your scheduler defined in the scheduler configuration or dag configuration.

image

image
The two DAG instances kick off for a two different days. One for previous day and one for current day. See two images attached.
Also the third image is the last dag run which was not the run that was kicked off based on the CRON: The CRON is set to schedule at 16 16 13 7 *, and it never kicked off.

image