Having issues scaling airflow


I’m having issues with scaling airflow over 700 tasks instances, using local executor and MySQL. The PIDs are getting killed with no other message. I am now trying with dask, and all seems to run with no errors. But now the dag runs aren’t being scheduled. It just sits there after being triggered.

I followed this:

I also had to set queues to None on the dag and run scheduler with the -do_pickle option.

Id like to try with dask and LSFCluster.

Help, thoughts?

1 Like

Our team at Astronomer doesn’t have much if any experience w/ DaskExecutor or LSFCluster.

  1. Have you considered using CeleryExecutor with Kubernetes and KEDA? We have a lot of experience scaling that, and the Celery workers scale to zero.

  2. Also Postgres w/ PgBouncer is a much better DB setup vs. MySQL. With Airflow 2.0, the difference may even become more pronounced, as Postgres has some features that the scheduler upgrades will take advantage of.

Happy to chat about it if you’re interested, feel free to grab a slot on my cal https://calendly.com/ryw/30min


Thanks for the response! Yes, i’d like to give CeleryExecutor a try. Do you have any straight forward documents for setting Celery up?

You can test the Airflow Helm chart locally w CeleryWorkers by following this walkthrough https://github.com/apache/airflow/tree/master/chart#walkthrough-using-kind

We’re very Kubernetes-centric at Astronomer, because we love the autoscaling + stability it provides.

For production, we recommend Astronomer Certified (https://www.astronomer.io/docs/ac/v1.10.10/get-started/production/) which you can run yourself for free – or use one of our commercially-supported products Astronomer Enterprise (https://www.astronomer.io/docs/enterprise/) or Astronomer Cloud (https://www.astronomer.io/cloud/).

With Celery, how any task instances do you think I can scale to max?

Also, I am not clear on how a messaging queue helps as an executor? Can you elaborate?

Also, does Celery work with Kubernetes (hand and hand)?

Yes we run all our Celery workers on Kubernetes.

If you use KEDA option, the # of Celery workers will autoscale depending on how many tasks are waiting for work. If you’d like, we could jump on a call to demo this to you.

You can also see Daniel talk about Keda here https://youtu.be/YLsGVFB8Pws?t=1688

Maybe we can do a call later today or tomorrow with a cohort of mine.

Hi ryw, I had an urgent meeting come up and i wasn’t able to attend today’s session. Do you have any other time slot today? I apologize.