I want to know what parts of the platform require persistent storage
- For the Airflow layer, we recommend running an externally managed postgres (RDS, Cloud SQL, etc) as almost all of these come with HA guarantees, regular backups etc. This holds all metadata pertaining to your dag runs. We’d recommend giving us our own postgres, as each airflow deployment will have its own database on that postgres.
- On the Kubernetes layer, we do rely on persistent volume claims for part of our user experience. We’ll need storage for a registry (which helps without deployment process), Prometheus (for metrics scraping), and Elasticsearch (for logs). You are free to send this data to things off of the cluster (e.g. you can configure fluentd to send logs to another service), but it is required for how our platform works (e.g., our API scrapes logs from Elasticsearch and brings them into our UI for users to track in real time)
As far as pods that claims - only a few of base platform pods will make PVCs - prometheus, a few for elasticsearch, and the registry.
$ kubectl get pvc -n astro-demo NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-calico-crab-alertmanager-0 Bound pvc-68dff041-0afd-11ea-a83e-4201ac10000b 2Gi RWO standard 13d data-calico-crab-elasticsearch-data-0 Bound pvc-69300cba-0afd-11ea-a83e-4201ac10000b 100Gi RWO standard 13d data-calico-crab-elasticsearch-data-1 Bound pvc-7be1fa73-0afd-11ea-a83e-4201ac10000b 100Gi RWO standard 13d data-calico-crab-elasticsearch-master-0 Bound pvc-6802a367-0afd-11ea-a83e-4201ac10000b 20Gi RWO standard 13d data-calico-crab-elasticsearch-master-1 Bound pvc-78437707-0afd-11ea-a83e-4201ac10000b 20Gi RWO standard 13d data-calico-crab-elasticsearch-master-2 Bound pvc-bcb7680d-0b17-11ea-a83e-4201ac10000b 20Gi RWO standard 13d data-calico-crab-prometheus-0 Bound pvc-6895e140-0afd-11ea-a83e-4201ac10000b 150Gi RWO standard 13d data-calico-crab-registry-0 Bound pvc-690f66b7-0afd-11ea-a83e-4201ac10000b 100Gi RWO standard 13d
As of Astronomer 0.8.2, no Airflow namespaces running the local executor or the Kubernetes executor will make any PVCs:
~$ kubectl get pvc -n astro-demo-true-hemisphere-2825 No resources found in astro-demo-true-hemisphere-2825 namespace.
Airflow environments running the Celery executor will make a PVC for Redis