Open Database Connectivity (ODBC) is a protocol that you can use to connect to a Database Management System (DBMS) which uses an ODBC driver. Integrating Airflow with ODBC allows users to connect and manage jobs in a data source that supports ODBC connectivity. This includes connecting Airflow to Synapse, MS SQL Server, Oracle, MySQL etc.
This document provides the basic setup for creating an ODBC connection to use with Synapse SQL.
Prerequisites
- The Astro CLI.
- A locally running Astro project.
- An Azure Synapse workspace.
- Permissions to access your Synapse Database from Airflow.
Get Connection details
A connection from Airflow to Azure data factory requires the following information:
- Subscription ID
- Data factory name
- Resource group name
- Application Client ID
- Tenant ID
- Client secret
- In your Azure portal, navigate to Synapse Analytics and select your subscription. Then, click on your Synapse workspace.
- Follow Azure documentation to retrieve your SQL Endpoint, SQL admin username, Name of your SQL pool, and your password.
Create your connection
-
Open your Astro project and add the following line to your
requirements.txt
file:apache-airflow-providers-odbc
This will install the ODBC provider package, which makes the ODBC connection type available in Airflow.
-
Add the following OS-level packages to your
packages.txt
:gcc g++ unixodbc unixodbc-dev
These are required for installation of the ODBC driver in your docker containers.
-
Append the following lines to your
Dockerfile
:USER root RUN curl https://packages.microsoft.com/keys/microsoft.asc | tee /etc/apt/trusted.gpg.d/microsoft.asc #Debian 11 RUN curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list RUN apt-get update RUN ACCEPT_EULA=Y apt-get install -y msodbcsql18 # optional: for bcp and sqlcmd RUN ACCEPT_EULA=Y apt-get install -y mssql-tools18 # optional: for unixODBC development headers RUN apt-get install -y unixodbc-dev USER astro
Refer to the Microsoft documentation for other OS types and versions.
-
Run
astro dev restart
to restart your local Airflow environment and apply your changes inrequirements.txt
. -
In the Airflow UI for your local Airflow environment, go to Admin > Connections. Click + to add a new connection, then choose ODBC as the connection type.
-
Fill out the following connection fields using the information you retrieved from Get connection details:
- Connection Id: Enter a name for the connection.
- Host: Enter your SQL Endpoint.
- Schema: Enter your SQL pool Name.
- Login: Enter your SQL admin username
- Password: Enter your password.
- Port: Enter
1433
as port. - Extra: Enter the string
{"Driver": "ODBC Driver 18 for SQL Server"}
.
-
Click Test. After the connection test succeeds, click Save.
How it works
Airflow uses PyODBC to connect to the ODBC data sources through the ODBCHook. You can directly use this hook to create your own custom hooks or operators.