Many folks using Airflow decide to integrate with Databricks (a managed Apache Spark service) to offload execution of jobs Airflow is responsible for orchestrating. A few notes on doing this with Astronomer Cloud:
1. Create a native Airflow Databricks Connection
You should be able to rely on native airflow connections and operators to connect to Databricks from Astronomer Cloud.
On Airflow, operators off-load auth/connections into the hook object. So, the databricks_operator jumps over to this databricks_hook.
2. Whitelist Astronomer Cloud’s Static IP
We route all Astronomer Cloud traffic through a single NAT gateway, so you’ll have to Whitelist Astronomer’s Static IP Address (35.188.248.243
)
Resources
- “Integrating Apache Airflow with Databricks” by Databricks
- “Whitelist IP Addresses” by Databricks
- VPC Access Doc on Astronomer
Note: If you’re using Databricks, Astronomer Enterprise might be a compelling solution to consider down the line. With a self-hosted setup, you’d be able to link the right AWS IAM roles to your nodes or peer corresponding VPCs as explained here, instead of whitelisting a Public IP.