Currently we are just triggering Spark jobs on AWS glue and some Lambdas. Since we are not processing data directly in Airflow, I’m wondering if we need Celery. Do you think the LocalExecutor is best for this type of workload?
It really depends on your use case as it stands and as you expect it to evolve.
The obvious risk with the LocalExecutor is that if something happens to your machine, your tasks will see a standstill until that machine is back up. So while the LocalExecutor is a great way to save on engineering resources for testing even with a heavy workload, we generally recommend going with Celery for running DAGs in production, especially if you’re running anything that’s time sensitive. With that said, you’ll find plenty of use cases out there written by folks that run quite a bit on a LocalExecutor before switching to Celery, so it’s certainly not forbidden practice.
One notable feature that the Celery Executor gives us: Airflow has what we call a “Worker Termination Grace Period” that helps minimize task disruption upon deployment by continuing to run tasks for an x number of minutes (configurable via the Astro UI) after you push up a deploy. Under a LocalExecutor, your workers will start immediately upon deployment regardless of whether or not tasks were mid-execution, which could be problematic if any of your DAGs are on a tight schedule. If you’re careful to plan code pushes and deploys you should be ok, but just something of note that our heavy users certainly appreciate.
Keep in mind that you’re not locked into an Executor either way - you’re free to adjust from one to the other as needed at any time (with proportional changes to your Astronomer bill).
Quick update on this one - we just released a thorough breakdown of the 3 primary Airflow Executors (Celery, Local, and the upcoming Kubernetes Executor) that should help.
To be able to use the upcoming Kubernetes Executor I’ll need to have a k8s cluster myself?
Hi @edbizarro - nope! You won’t need to set up any Kubernetes backend to run the Kubernetes Executor on Astronomer Cloud once it’s out on our platform - those pods will live within Astronomer’s environment and be looked after by our team.
Managed usability is precisely the value add we’re shooting for