Hi I am new to Airflow and have been exploring it via astro-cli.
Thanks for streamlined set-up and detailed tutorials.
I have been looking over the docs and I am unable to find how one can deploy airflow via astro-cli for production on their own server.
With that I mean, I have my own ec2 servers in cloud and I would like to self-host my production code and not by Astronomer platform.
Kindly let me know if that’s possible. If yes, then how?
As of now, I cannot opt for Astronomer hosting until I can do a comparative assessment of effort-reward ratio between self-hosting and astronomer hosting.
Welcome to the Airflow community! Glad you liked the tutorials.
To answer your question Astro CLI uses Local Executor which is recommended for testing Airflow DAGs locally or in a dev or test environment. We do not recommend using LocalExecutor for Production environment unless you have a handful of DAGs and want to run/test in a temporary Production environment.
For more information about executors, see different available Executors.
If you want to run Astro CLI on a remote machine, all the available Airflow services can be exposed on
0.0.0.0. You can enable or disable this using
expose_port setting of Astro CLI. See Configure Astro CLI for details.
Thanks @manmeet for the quick reply.
I kind of figured the part about using local executor and running on remote machine. But that would hardly scale.
I was wondering just like native airflow provides roadmap for a full scale production deployment. Imagine dozens of DAG files each executing different ETL tasks moving GBs of data in a single day.
Is there a way to achieve same by astro-cli without using the Astronomer platform? Or are they intrinsically tied together?
Astro CLI can be used for both local development and with Astronomer platform. So you can use it independently of the Astronomer platform to develop and test your DAGs. But to use it to deploy to production environment, you must be a user of Astronomer platform.
Have you tried Astro’s pay-as-you-go? Might be worth a try and you can auto-scale up and down as per your DAGs run schedule. The first 14 days are free!
Thanks @manmeet. It clarifies my doubt.
I guess, I will just have to go back to my management to discuss more on it.
To give more insight - Airflow is a distributed system and like all distributed systems, there are a number of considerations required to get that running in production.
It is not hard to run, but do consider that at a certain scale it could take a whole person full-time to keep it happy and healthy and scaling.
Airflow often runs your most critical jobs, so I’d suggest that it’s worth investing in having a healthy Airflow.
I am an Astronomer Employee (so may be biased) but I have run Airflow OSS myself on Kubernetes before I worked here. I would recommend Astro over that experience for most people.
Running Airflow on Kubernetes would make scaling and stability far easier, but then you also need a person to keep that Kubernetes cluster happy, stable, and scalable on top of Airflow.
Astro does this all for you - as we manage the Kubernetes Cluster and the Airflow.
Our trial is free for 14 days - so you risk nothing by signing up for it. Feel free to reach out if you want to talk more about different ways of running Airflow - I’m happy to discuss other options if Astro just doesn’t make sense for you at the moment.