Terraform to control dag state on/off

Airflow dags can be turned on/off at will via the airflow UI. This is risky as an accidental mouse click could inadvertently kill a pipeline. Also I had an issue that I renamed a dag in a PR, and new dags default to off. Luckily I checked and rectified this, but this process should be controlled and under version control e.g. via Terraform. I’ve done some googling but not found anything on this topic. Can anyone suggest how/if this can be achieved?

Hi @robmarkcole!

I have been on that side of the bridge before :frowning: Accidentally turning off DAGs and renaming existing DAGs but expecting them to be turned on.

For the latter case, however, that is more so the unexplained nature of Airflow configurations. There is one setting called dags_are_paused_at_creation, which is False by defaul in airflow.cfg.

On the topic of tracking whether a DAG is active, that one is a little more gray. Terraform, an infrastructure orchestration tool, in my opinion is not the correct instrument for managing that information. Currently the metadata that determines whether the DAG is on or not is stored in the database along with the other metadata. A flip of the switch on the UI sets the active value.

I would say that whether the DAG is active is more business logic than infrastructure as code. Also I’m not sure what you are envisioning in terms of Terraform integration given there are no supported providers nor do i see how a custom provider should update the metadata in a database, which is what it ultimately will need to do.

It is impossible (to my knowledge) to track when someone has turned off a dag, whether intentionally, accidentally, or even maliciously. But the flexibility is often required, I find, to accommodate a user’s often changing needs. Someone wants to pause a DAG while upstream services are being fixed. Another person wants to turn it on to but they aren’t aware of the issues and wondered why it’s off. My point is that plenty of reason to want to change the state though most often times we just leave it on forever.

Regardless, I do agree with you that it sucks when those accidents happen. Right now my opinion/solution is to have plenty of alarms and metrics that align with what the team expects. This is more reactive but it’s what I got as I can only prevent changes to the state via role permission. Someone will still have to be admin but they shouldn’t be burdened with that responsibility.

I hope that gave some perspectives and I know I didn’t really answer your question. Please do update if you do explore the Terraform route. Would like to see how that worked for you and how it impacts the user experience.

Hi Alan
thanks for the very thoughful reply! On my local setup I can of course query the airflow db and track any changes of dag state, but as our prod setup is on astronomer this is not possible (db access unavailable owing to security concerns). So likewise our current approach is monitoring (currently just via the airflow UI), but looking forward to datadog integration. I agree Terraform is probably inappropriate, but hoping to hear some better suggestions that at least get the status under version control.

You’re welcome, @robmarkcole!

Yeah, the Datadog integration definitely has been a hot topic. Will definitely update you when we have something concrete with official supported documentation.

1 Like