This might be a stupid question, but I’m new to Airflow and I’ve only gotten Astro working using “astro dev start”, which launches the 4-container instance. I’d like to add a container of my own to the cluster for my DAGs to interact with via custom API listeners I’ve built. Are there instructions for how I can do this? Or where the Docker compose file used under the hood is located? Thanks!
Hey @Phobia42
Welcome to Astronomer Forum and to Airflow.
And no question is stupid
To answer your question, I would like to understand your use-case properly.
- What service do you want to run in the separate container?
- Are the DAGs having any trouble interacting with the custom API listeners you have now ?
- Are you keeping your DAGs in the
dags
folder currently ?
Astro CLI, automatically creates your Airflow project with dags
directory and other required components like include
, packages.txt
and requirements.txt
.
You can keep your DAGs in the dags
folder and your extra classes to interact with the custom API listeners in include
directory. And then import the extra classes in your DAGs.
You can read this guide for more information: Manage Airflow code | Astronomer Documentation
Hope this helps!
Thanks
Manmeet
Thanks for the reply! I’m trying to make the transition from BA to Data Engineer and working on developing some of the skills I’m missing, such as Airflow (although I’m focusing more on using it than configuring it). I recently completed a POC which takes two Python applications I wrote (one for downloading data via API, and one for loading flat files to a database) and orchestrated by Airflow. The programs are all somewhat complex and self-contained, so just adding them to the “include” folder might not be suitable.
I’ve got everything working locally, but only when Astro is running in its containers via Docker, and my apps are running in listener mode on the host computer. I’m not quite at the point of deploying it all to the cloud with dedicated IP addresses, etc., but wanted it all to be more self-contained. I figured it would be simplest if I made it a 5th container in the Astro cluster, so it would launch through the command line, and could leverage the cluster network for communication.
I realize this isn’t a proper production-ready architecture, but hoped it would allow me to focus more on writing DAGs than configuring infrastructure. Once I’m more comfortable with how to use Airflow (as if I was working for a company that already had an established infrastructure in place), I can learn more about the under-workings with time. I also welcome any alternative approaches as long as they are easy to setup. I appreciate any feedback.
Hey @Phobia42
Apologies for the delay in response.
I would suggest if you have reusable and complex operations that need to be carried out, you create custom operators. You can follow the below steps:
- Your application can be packaged into a separate python application
- You can then easily install it via
requirements.txt
in Airflow - You can write your own custom operators using these applications or directly use them in your DAG
- Custom operators can be organized in a sub-folder in the
include
directory
Let me know if this helps.
Thanks
Manmeet