Storing a Databricks Connection as an Airflow Environment Variable

I want to store my databricks connection information as an env variable.
as mentioned in

Managing Connections — Airflow Documentation.

I am also looking at the following:
https://docs.databricks.com/dev-tools/data-pipelines.html

it says to set the login as : {“token”: “”, “host”:""}
I not sure what to export… does anyone have a clue?? I have the token etc… but what is the export statement?

Thanks for reaching out here, @bclipp770 ! It’s good to hear you’re looking to store a Databricks connection in an Env Var - that will be lighter on your Postgres Database. A few guidelines below.

1. Identify Connection Values

To store your databricks connection as an Environment variable, you need to generate an Airflow Connection URI for it. To do so, collect the values you need for it first:

  • conn_id=databricks_default (this is the name of your connection)
  • conn_type=’databricks
  • login=’token
  • password=’<your-databricks-password>
  • host=’<your-databricks-host-name>’
  • extras={"token": "<personal access token>", "host":"<Databricks hostname>"}

You shouldn’t need values for Port and Schema. For more info, there’s an example near the bottom of Integrating Apache Airflow with Databricks (a Databricks blog post).

Note: As you do this, make sure you also have the Databricks Provider installed (Airflow 2.0+).

2. Generate a Connection URI

Given the above values, your connection URI would then be:

ENV AIRFLOW_CONN_databricks_default=databricks://<your-databricks-access-token>:<your-databricks-password>@<your-databricks-host-name>

Note: If you already have this connection in the Airflow UI and want to pull out the Conn URI from it, you can run $ airflow connections get from the Airflow CLI.

3. Set your Env Var

Now, put that Connection URI into an Environment Variable format:

ENV AIRFLOW_CONN_<CONN_ID>=<connection-uri>

You should be set! If you’re an Astronomer customer, you can also refer to “Adding Airflow Connections and Variables via Env Vars” in our docs. Want to give this a shot, and let us know if it works?

Thank you @paola ,

I see the difference.

  1. the databricks guide shows installing pip install “apache-airflow[databricks]”
  2. uses a access token.

the guide you shared uses the apache-airflow-providers-databricks pypi provider.
I think if possible I would rather use the token, that would be the better route if possible for production.
if I throw apache-airflow[databricks] in the requirements file I assume it’s going to not like that.

If needed I can settle for username and password but I would prefer a token.

I just noticed you have the token for the username I don’t think that will work. I will test and update.

@bclipp770 Yea, that new Provider syntax is specific to Airflow 2.0, which is when Providers were separated from core Airflow codebase.

On the token thing, take a look at the Databricks Operator source code - you actually should be able to set login to token and then throw your token in the extras field of the Connection instead (and just adjust the Conn URI accordingly). In any case, give all of this a shot and let me know if it works :slightly_smiling_face:

1 Like

I’ll try to find some time tonight and see if not over the weekend. Thank you.

@bclipp770 Actually, looks like a team member at Astro just added some instructions around the Databricks connection to Airflow Docs! PR here. From the looks of it, that should be what you need to complete step 1 above. I’ll update my post with that info :slightly_smiling_face:

@paola

Since I got the UI connection working, I used the following command to display the URI airflow connections get databricks_default after bashing into the container. I noticed it’s replacing non-alphanumeric characters. That was throwing me off I think.

Thanks for the help.

1 Like

Excellent! Glad you got it working @bclipp770.