In my use case I need to setup configuration files on the worker nodes for my DAGs to function properly. I am using Astronomer Cloud, and as far as I understand the platform, I have currently two options to setup those
- Generate the files before or during the image build process in my CI/CD pipeline and COPY them into the image
- Create a specific airflow DAG that I can run on demand and that fetches the config and sets up the files in the container.
Putting the config files, with potential secrets on them, in the image does not feel very safe. On the other hand having a full DAG to do the config does not work well neither, as I will have to remember to run it every time I re-deploy. Some other, more convenient approaches that I do not think are possible are
- Bootstraping the entrypoint to fetch and setup the configuration on start-up
- SSH to the container and put the files there after deployment
What are the alternatives here?
Yes, you are correct that the last two options wouldn’t really work for cloud.you do not have access to ssh into the containers. Typically secrets are stored as Airlfow connections through the airflow UI. Is that a viable solution for you? Here at Astronomer, we’ve been brainstorming ways to pull secrets from other systems such as vaut of aws secrets manager. We don’t have anything on the roadmap yet, but it’s definitely something we are thinking about. So i would say for now i’d recommend baking any non sensitive config settings into your image, and then manually creating Airflow secrets for you sensitive details. We have seen people create a DAG to sync airflow connections from a remote source like vault, but that is not an out of the box solution and you would have to trigger that DAG everytime the secrets chagned. that could be accomplished with the Airflwo REST api i think though.
Storing the secrets in Airflow connections does not work in my case. I need the JSON configuration files with secrets to operate.
In the end we solved the situation using the KubernetesPodOperator
for running the tasks that required physical files with credentials in the filesystem on containers whose entrypoint we could control, though if feels a quite complicated workaround…
Have you considered the possibility of supporting post-start scripts? Similarly to what you already do for requirements.txt and packages.txt but with actual bash code that gets executed right after startup. That would be useful I think.