I am running Airflow with the KubernetesExecutor on AWS EKS. We are using AWS EFS drives to support both the DAGs folder and logging. (We also mount an EFS drive to some worker pods for persistent storage.) The drives are being mounted using PersistentVolumes and PersistentVolumeClaims.
Everything is functioning correctly, but the solution wont scale. As the number of task pods increases, the time required to mount the volumes onto the pods increases in a near linear fashion. The pods remain in a Pending phase and show FailedMount events in the
kubectl describe pod commands.
Eventually, tasks created with the KubernetesPodOperator will begin to fail with “Pod took too long to start.” exceptions (even when i set the timeout >6 minutes).
The volumes are being mounted using the EFS CSI driver. However, I have also defined the PVs using a “hostPath” when the volumes are mounted to the underlying EC2 instances, with the same outcome.
Any comments are appreciated.