Which tasks are constantly running on Airflow?

andykey17 · April 6, 2023, 8:28am

I’m deploying airflow on Azure Kubernetes Service using helm. The Airflow DAG and log folders are mounted to a blob storage, using NFSv3 method. I noticed there is a surge of transactions every 15 minutes, even when there’s no activity.

I checked the configuration references on Airflow document (link) and change “min_file_process_interval”, which is responsible for parsing DAGs, to once every 5 minutes, hence the little peaks on chart.

Here’s some information about my current deployment:

Executor: Kubernetes Executor
min_file_process_interval: 300 seconds
Number of scheduler replicas: 2
Number of web servers: 2

My questions are:

Which tasks are running constantly underneath Airflow service?
Which parameters (airflow.cfg, helm chart values) affect the task schedule rate?

fritz-astronomer · April 7, 2023, 8:50pm

Which tasks are running constantly underneath Airflow service?

The processes that would run normally (dependent on Airflow Version) would be

DAG Parsing, which is part of the scheduler. This process regularly attempts to reprocess your DAGs to determine if there are changes to them. It is a good practice to keep top-level code to a minimum in your DAGs, as that can affect DAG Parsing.

https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/dagfile-processing.html

If you are using the Astro CLI you can determine how long your DAGs are taking to parse locally by running astro dev run dags report.

Because you are self-hosting Airflow, you can use airflow dags report if you connect to your running Airflow instance in AKS.

Airflow Scheduler’s Scheduling Loop

Which parameters (airflow.cfg, helm chart values) affect the task schedule rate?

Airflow’s scheduler runs on a constant loop attempting to schedule tasks. Tasks can be scheduled based on a number of parameters, some of which include:

DAG schedule
Task Dependencies
Available Global Parallelism and Pool slots
Maximum DAG Run / Task Instance concurrency

Read more here about scheduling and tuning - but note that I would recommend to keep these settings to defaults unless you know what you are tuning for and why:
https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#fine-tuning-your-scheduler-performance

Topic		Replies	Views
How to git-sync DAGs in dynamically created KubernetesExecutor worker pods Airflow	2	5773	June 25, 2020
Airflow DAG Run Delay: running State Persists After Tasks Complete	0	30	November 8, 2024
Task take very long and log printing out same line over and over again Airflow	3	2685	August 9, 2019
External dag dependency set on downstream which run 6 times per day Airflow	1	2154	September 16, 2020
Airflow takes longer to execute Airflow	1	2888	September 20, 2021

Which tasks are constantly running on Airflow?

Related topics