Problem scaling KubernetesExecutor with PVs on EKS

I am running Airflow with the KubernetesExecutor on AWS EKS. We are using AWS EFS drives to support both the DAGs folder and logging. (We also mount an EFS drive to some worker pods for persistent storage.) The drives are being mounted using PersistentVolumes and PersistentVolumeClaims.

Everything is functioning correctly, but the solution wont scale. As the number of task pods increases, the time required to mount the volumes onto the pods increases in a near linear fashion. The pods remain in a Pending phase and show FailedMount events in the kubectl describe pod commands.

Eventually, tasks created with the KubernetesPodOperator will begin to fail with “Pod took too long to start.” exceptions (even when i set the timeout >6 minutes).

The volumes are being mounted using the EFS CSI driver. However, I have also defined the PVs using a “hostPath” when the volumes are mounted to the underlying EC2 instances, with the same outcome.

Any comments are appreciated.

Hi @jimmajure,

Can I ask why you are choosing to mount via a volume vs. baking the DAGs into the image? In terms of speed, you’ll get the best performance if the DAGs are local.

WRT why you are running into these issues this sounds like a scaling problem in EFS. To test this, coul you try to launch 100/1000 pods NOT via airflow and have them all mount the same EFS directory? If this is not the case then I would want to investigate further, but volume mounting failures shouldn’t have anything to do with airflow.

hi @dimberman, thanks for your reply.

First, I agree that this is not an Airflow problem, per se. It clearly is an issue with K8S and/or EKS.

Regarding scaling and EFS, I have worked with AWS support. They’ve looked at the EFS drives, hosts, network traffic, etc, and they don’t see any specific issues. They (AWS support) have tried to scale up pod creation mounting a common EFS drive and have not been able to reproduce the issue.

Maybe a better way to phrase the question is whether anyone has had success scaling up Airflow on EKS using the KubernetesExecutor. If so, how have you managed DAGs and logs.

We have also implemented a container chaining approach which passes data from one task to another on an external drive, in this case, an EFS drive.

We have multiple very large-scale EKS uses for the astronomer platform. I can’t speak to how much they’ve scaled up individual airflow instances but we haven’t seen any issues like this (though with our platform we bake the DAGs into the image and use elasticsearch for logging).

Can you try launching airflow with either DAGs baked or without EFS for logging and see if this issue persists?

I will be trying several things.

First I will try to skip the PersistenVolume/PersistentVolumeClaim mechanism all together. In the Airflow Kubernetes config section, I’ll use the host option to provide a path on the host, itself, and I’ll mount the EFS drives directly onto the host. This might help if the issue is the PV mechanism in K8S, which is possible.

If this approach doesn’t show any signs of success, I’ll try eliminating EFS altogether, although we do need some form of shared persistent disk.

I prefer not to bake the DAGs into the container if I can avoid it because it makes updating the DAGs much more cumbersome. We have CI/CD tools that can just update the dag files on the EFS drive very quickly.

Thanks for your replies. I appreciate the advice.

My pleasure, let me know how it goes :slight_smile:

Quick update…

I eliminated the use of PersistentVolumes and PersistentVolumeClaims and that solved the problem.

I’m still seeing a few pod scheduling issues that don’t quite make sense to me, but I’m sure I’ll figure it out.

Thanks, again.

1 Like

Even we are planning to use EFS mounted dags for airflow. Would like to know you mounted them in eks cluster without using PV & PV claim? Also Do you see any improvement in performance with approach you specified?

What we discovered was that mounting efs drives via PVs/PVCs in our specific use case places a high overhead on the K8S infrastructure. We have lots of relatively short-lived jobs, which means that mounting and unmounting drives occurs quite frequently. Mounting and unmounting PVCs requires calls to the kubelet apis, which are throttled.

The recommendation we received was to increase the values for the following kubelet parameters:

kubeletExtraConfig:
    kubeAPIQPS: 501
    kubeAPIBurst: 1001

To be honest, we have not done this in production. Instead, we simply mount the EFS drive onto the ECS worker instances and mount into the pods using a hostPath. This is clearly, not ideal; using PV/PVC would be much better as it abstracts the containers/pods from the underlying “hardware”.

Also, be aware of the throughput model for EFS drives. The amount of throughput you get is proportional to the amount of data on the disk, unless you purchase provisioned throughput. We hit this limit a couple of times and it was tricky to figure out where the issue was.