Overview
Airflow users looking to pass a relatively large amount of data to a destination (e.g. downloading data from somewhere and dumping it to S3) might need to configure ephemeral storage on a Celery Worker or Kubernetes Worker Pod.
On Astronomer, ephemeral storage is configured at the platform level here and, as noted above, applies to all Celery Workers or Kubernetes Worker Pods on the platform: https://github.com/astronomer/astronomer/blob/master/charts/astronomer/templates/houston/houston-configmap.yaml#L128. The default ephemeral storage limit is 2Gi by default.
If you have a task that needs more than 2Gi of ephemeral storage, you can:
- A. Raise the default at the platform level and have it apply to all workers/tasks (though it’s likely that not all workers will need that much storage)
- B. Switch to the KubernetesExecutor and request/mount a volume at the task level (you can’t set volume requests with Celery) and then manually delete that volume from the cluster after the task has completed
Note: As of Airflow 1.10.11, users will be able to use the KubePodOperator and leverage the ability to define ephemeral storage at the task level (apache/airflow#6337), without having to touch platform-level defaults.
Additional Notes:
- Currently, ephemeral storage is not accounted for in AU’s and is not affected by changes to “Extra Capacity”
- The maximum ephemeral storage limit (50Gi) (i.e. the maxium value you could set the default to above) is hardcoded into our Houston API’s configmap here: https://github.com/astronomer/astronomer/blob/master/charts/astronomer/templates/houston/houston-configmap.yaml#L213