If you’re running Airflow on Astronomer and tweaking your resources, you might notice that we set values for “Database Connections” and “Client Connections” alongside the # of AU’s allocated to your deployment. Read below for guidelines on what those are.
Available connections are multiples of your total AU. If you scale your deployment via the “Configure” page on the Astronomer UI by increasing the number of AU’s allocated across components, you’ll generally increase the number of "connections that are created.
There are 2 types of connections:
Database connections = The number of connections that Pgbouncer will open against your deployment’s backend metadata database. The “Pgbouncer” is a light-weight connection pool manager for Postgres that helps protect your deployment’s backend metadata DB from getting throttled.
Client connections = The number of connections that can be opened against the Pgbouncer from your Airflow pods. The Webserver, Scheduler, and each Kubernetes Executor worker pod will open some amount (1-5) of connections to the Pgbouncer pod.
Pgbouncer will pool connections from Airflow pods and run them over x database connections. Connections are terminated after transactions completed and it’ll move on to the next waiting client. If you’re running Astronomer Enterprise, you can see these metrics on the “Database Activity” Grafana dashboard on a per-deployment basis.
Database and client connections are configured as part of the database proxy (pgbouncer) we deploy with every Airflow deployment. The lower number represents the number of connections the proxy can open to the actual backend database (RDS). The higher number represents the number of connections that can be opened against the proxy from the Airflow pods. Every airflow pod that connects to the database will create an internal connection pool of 5, so actual usage per pod will be between 0 and 5 against the proxy by default.
This proxy is in place to prevent a single deployment from running away with connections and causing more problems for other consumers of your database. It’s similar to the quotas that are deployed to Kubernetes namespaces, which prevent any one deployment from running away with resources. That’s also why they scale up together. Open connections also consume file descriptors on the underlying hosts, which are another finite resource to keep contained.
Note: These connections do NOT have any impact on the way you write your DAGs or how many concurrent connections you hold to your own databases. Rather, they’re about how the Webserver, Scheduler, and Workers connect to Astronomer’s Postgres to update the state of variables, DAGs, and tasks.