Monthly Dag getting triggered Daily

Hi,

All our monthly scheduled dags are getting triggered Daily in 2.2 Version. this Issue is occurring after implementing the Airflow log purging job to cleanup 40 days older logs. Before this log Purging job implementation these monthly dags are running fine as per schedule.

Am I missing something here? Any steps to follow after implementing the Airflow log purging job? Kindly revert as we are facing this issue in production and currently we have OFF these monthly dags and running them manually as per schedule.

Thanks
Pradeep

Hey @meherp

Could you please share how you are purging Airflow logs? Where are these logs stored? Only possible explanation is that you are purging the DAG Run data from Airflow metadata db and the catchup parameter of the DAG is set to True which is causing older DAG runs. Could you please confirm the value of schedule and catchup parameter?

Thanks
Manmeet

Hi @manmeet,

Thanks for your kind response.

We are purging the log using an airflow dag which finally running the below find command and deleting them.
These logs are getting stored in /AIRFLOW/airflow/logs .

DELETE_STMT : find /AIRFLOW/airflow/logs/dag_processor_manager/dag_processor_manager.log /AIRFLOW/airflow/logs/dag_processor_manager/dag_processor_manager.log.1 /AIRFLOW/airflow/logs/dag_processor_manager/dag_processor_manager.log.2 /AIRFLOW/airflow/logs/dag_processor_manager/dag_processor_manager.log.3 /AIRFLOW/airflow/logs/dag_processor_manager/dag_processor_manager.log.4 /AIRFLOW/airflow/logs/dag_processor_manager/dag_processor_manager.log.5 /AIRFLOW/airflow/logs/dummy_DAG_daily/hello_task /AIRFLOW/airflow/logs/dummy_DAG_monthly/hello_task /AIRFLOW/airflow/logs/jam_cmp_jio_fiber_monetization_0468/branch_task_ct /AIRFLOW/airflow/logs/jam_cmp_jio_fiber_monetization_0468/branch_task_email /AIRFLOW/airflow/logs/scheduler/2023-03-20 /AIRFLOW/airflow/logs/scheduler/2023-03-21 /AIRFLOW/airflow/logs/scheduler/2023-03-22 /AIRFLOW/airflow/logs/scheduler/2023-03-23
…(big list of folders) …
-type f -mtime +40 -exec rm -f {} ;

Below given are Airflow parameters set for the dag and it was running fine from last 8-9 months but started running daily after scheduling the log cleanup dag.

l_start_date = datetime(2022, 3, 15)
l_schedule_interval = ‘30 04 4 * *’

default_args = {
‘owner’: getpass.getuser(),
‘depends_on_past’: False,
‘start_date’: l_start_date,
‘email’: [‘airflow@example.com’],
‘email_on_failure’: False,
‘email_on_retry’: False,
‘retries’: 1,
‘retry_delay’: timedelta(minutes=5),
}

dag = DAG(
dag_name,
schedule_interval=l_schedule_interval,
default_args=default_args,
catchup=False
)

Hey @meherp

Could you please confirm the following:

  1. So, as per the schedule the next run date is 4th April at 4.30, but do you see the DAG is running for today as well or are you just retroactive runs?
  2. Can you check the Airflow Code in the UI to make sure the code with this schedule is updated as you pasted here?
  3. By mistake are you deleting the metadata db file ?

Long story short, DAG schedule should not be affected by cleaning logs. There is something else going on here. You can also try by copying the same schedule to another dummy DAG and verify it. Also, please add catchup=False in your DAG params.

Thanks
Manmeet

Hi,

To answer your questions.
catchup=False is already there in Dag Properties (as mentioned in the code in my previous post)

1 - We have already OFF the dags as they were running daily after 9th March till 22nd March and impacting business. We got escalation from client on 22nd and then OFF all monthly dags and running them manually as per schedule. Below is the screen shot of our monitoring tool from 22nd March.

Funny part is it was not creating new log files, rather it was appending new logs for every day run in the 9th March log. So by seeing the UI in tree view we were not able to find any new run, unless we open the latest log in UI where it was appending each days logs.

2- The code I pasted was from UI only.
3- No, we have not touched any db table clean-up. Only Unix log clean up jobs from /AIRFLOW/airflow/logs is implemented.

Thanks,
Pradeep.