Which tasks are constantly running on Airflow?

I’m deploying airflow on Azure Kubernetes Service using helm. The Airflow DAG and log folders are mounted to a blob storage, using NFSv3 method. I noticed there is a surge of transactions every 15 minutes, even when there’s no activity.
image

I checked the configuration references on Airflow document (link) and change “min_file_process_interval”, which is responsible for parsing DAGs, to once every 5 minutes, hence the little peaks on chart.

Here’s some information about my current deployment:

  • Executor: Kubernetes Executor
  • min_file_process_interval: 300 seconds
  • Number of scheduler replicas: 2
  • Number of web servers: 2

My questions are:

  • Which tasks are running constantly underneath Airflow service?
  • Which parameters (airflow.cfg, helm chart values) affect the task schedule rate?

Which tasks are running constantly underneath Airflow service?

The processes that would run normally (dependent on Airflow Version) would be

  1. DAG Parsing, which is part of the scheduler. This process regularly attempts to reprocess your DAGs to determine if there are changes to them. It is a good practice to keep top-level code to a minimum in your DAGs, as that can affect DAG Parsing.

https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/dagfile-processing.html

If you are using the Astro CLI you can determine how long your DAGs are taking to parse locally by running astro dev run dags report.

Because you are self-hosting Airflow, you can use airflow dags report if you connect to your running Airflow instance in AKS.

  1. Airflow Scheduler’s Scheduling Loop

Which parameters (airflow.cfg, helm chart values) affect the task schedule rate?

Airflow’s scheduler runs on a constant loop attempting to schedule tasks. Tasks can be scheduled based on a number of parameters, some of which include:

  • DAG schedule
  • Task Dependencies
  • Available Global Parallelism and Pool slots
  • Maximum DAG Run / Task Instance concurrency

Read more here about scheduling and tuning - but note that I would recommend to keep these settings to defaults unless you know what you are tuning for and why:
https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#fine-tuning-your-scheduler-performance