Create ODBC connection in Airflow

Open Database Connectivity (ODBC) is a protocol that you can use to connect to a Database Management System (DBMS) which uses an ODBC driver. Integrating Airflow with ODBC allows users to connect and manage jobs in a data source that supports ODBC connectivity. This includes connecting Airflow to Synapse, MS SQL Server, Oracle, MySQL etc.

This document provides the basic setup for creating an ODBC connection to use with Synapse SQL.

Prerequisites

Get Connection details

A connection from Airflow to Azure data factory requires the following information:

  • Subscription ID
  • Data factory name
  • Resource group name
  • Application Client ID
  • Tenant ID
  • Client secret
  1. In your Azure portal, navigate to Synapse Analytics and select your subscription. Then, click on your Synapse workspace.
  2. Follow Azure documentation to retrieve your SQL Endpoint, SQL admin username, Name of your SQL pool, and your password.

Create your connection

  1. Open your Astro project and add the following line to your requirements.txt file:

    apache-airflow-providers-odbc
    

    This will install the ODBC provider package, which makes the ODBC connection type available in Airflow.

  2. Add the following OS-level packages to your packages.txt:

    
      gcc
      g++
      unixodbc
      unixodbc-dev
    
    

    These are required for installation of the ODBC driver in your docker containers.

  3. Append the following lines to your Dockerfile:

    
      USER root
      RUN curl https://packages.microsoft.com/keys/microsoft.asc | tee /etc/apt/trusted.gpg.d/microsoft.asc
    
      #Debian 11
      RUN curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list
    
      RUN apt-get update
      RUN ACCEPT_EULA=Y apt-get install -y msodbcsql18
      # optional: for bcp and sqlcmd
      RUN ACCEPT_EULA=Y apt-get install -y mssql-tools18
      # optional: for unixODBC development headers
      RUN apt-get install -y unixodbc-dev
      
      USER astro
      
    

    Refer to the Microsoft documentation for other OS types and versions.

  4. Run astro dev restart to restart your local Airflow environment and apply your changes in requirements.txt.

  5. In the Airflow UI for your local Airflow environment, go to Admin > Connections. Click + to add a new connection, then choose ODBC as the connection type.

  6. Fill out the following connection fields using the information you retrieved from Get connection details:

    • Connection Id: Enter a name for the connection.
    • Host: Enter your SQL Endpoint.
    • Schema: Enter your SQL pool Name.
    • Login: Enter your SQL admin username
    • Password: Enter your password.
    • Port: Enter 1433 as port.
    • Extra: Enter the string {"Driver": "ODBC Driver 18 for SQL Server"}.
  7. Click Test. After the connection test succeeds, click Save.

How it works

Airflow uses PyODBC to connect to the ODBC data sources through the ODBCHook. You can directly use this hook to create your own custom hooks or operators.

References

Thanks a lot @manmeet I’m getting the following error:
[unixODBC][Driver Manager]Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Any idea why it could be happening ?

Hey @semeonb

The probable cause for the error is that the Driver is not properly installed or specified in the connection.

Could you please double-check the following:

  1. Your instructions to install ODBC driver are correct in the Dockerfile
  2. You restarted your Airflow after changing the Dockerfile.
  3. You included the driver detail in the Extra field of the connection.

Thanks
Manmeet

Just in case that anyone is facing the same problem, it is clear in the documentation of airflow-odbc that you need to set to True the parameter allow_driver_in_extra in the config section providers.odbc, to be able to use the extra driver parameter. You can do this with the env variable: AIRFLOW__PROVIDERS_ODBC__ALLOW_DRIVER_IN_EXTRA=true.