When should I use the Kubernetes over the Celery and Local Executor?

For which cases should I use the Kubernetes Executor over the Celery and Local Executors?

Out of the box, Astronomer supports the Local, Celery, and Kubernetes executors.

Local:

  • Everything runs on the same pod as the scheduler for the local executor (“single box” approach).
  • Because of that, the local executor is not very resource intensive and is great for anything involving dev environments or other types of “lightly used” environments
  • This doubles when you are deploying code on the Astronomer platform. Tasks will get marked as zombies when a code push occurs and go into retry (when configured as such).

Celery:

  • Using the Celery executor, you can run dedicated worker pods for your tasks
  • You can add/remove the number of pods as well as modify the resources on each one
  • Each worker on Astronomer is the same; for that deployment.
  • Celery executor also gives you access to ephemeral storage for your pods
  • Deploys are also handled gracefully. In the event of a code push when on the celery executor, jobs will run until the worker termination grace period (when they’ll be marked as zombies).

Kubernetes:

  • Each task on the Kubernetes executor gets its own pod, which allows you to pass an executor_config in your task params. This lets you assign resources at the task level by passing an executor_config
# Sample config
test_config = {"KubernetesExecutor": {"request_memory": "8Gi", "limit_memory": "8Gi", "request_cpu": "10Gi", "limit_cpu": "10Gi"}}
...

# Pass config into task
run_compute = PythonOperator(
        task_id='run_model',
        provide_context=True,
        executor_config=test_config,
        python_callable=h.jira_functions.jira_completed_tickets
    )


  • Since each task is a pod, it is managed independently of the code deploys. This is great for longer running tasks or environments with a lot of users, as users can push new code without fear of interrupting that task.

This makes the k8s executor the most fault-tolerant option, as running tasks won’t be affected when code is pushed

  • However, because each task is its own pod, they make take a little time to start.

In summary, the Celery executor is a great fit for any environment where the tasks are “similar” and you can find a configuration for the worker that fits all sizes, or for any tasks that need to run quickly (since the workers are “always on”).

The Kubernetes executor is great for dags that have really different requirements between tasks (e.g, the first task may be a sensor that only requires a few resources, but the downstream tasks have to run on your GPU node pool with a higher CPU request). It’s also great for environments with long running tasks and users pushing code when jobs are running (since there is no grace period concept).

In the near future, Astronomer will have an option for KEDA autoscaling on Celery, combining a lot of the great features between Kubernetes executor and Celery executor.

2 Likes

Hi virajparekh,
do you already know when KEDA will be available for Astronomer users? I would really like to use the function, because I need several workers to process my Airflow Tasks at night. During the day, however, the workers have little to do.

Hi @Jonnyblacklabel - we are doing some final testing around it. Which customer are you with? We can make sure to keep you in the loop around beta testing and timelines.

Hi virajparekh, thanks for your answer. i’m with get:traction.
I have registered here in the forum with my private github account

1 Like

Hey @virajparekh,
I just wanted to ask if there’s been any news on the KEDA Operator?
Thank you :slight_smile:

Hi @virajparekh, is there anything new on KEDA for astronomer cloud?
cheers :slight_smile:

Hi @Jonnyblacklabel! KEDA is still experimental here at Astro but we’re working on deeper testing to enable it on Astronomer Cloud. We’ll update this post as soon as that’s the case!

Hey @paola,
thank you for the update :slight_smile: