Create ODBC connection in Airflow

Open Database Connectivity (ODBC) is a protocol that you can use to connect to a Database Management System (DBMS) which uses an ODBC driver. Integrating Airflow with ODBC allows users to connect and manage jobs in a data source that supports ODBC connectivity. This includes connecting Airflow to Synapse, MS SQL Server, Oracle, MySQL etc.

This document provides the basic setup for creating an ODBC connection to use with Synapse SQL.

Prerequisites

Get Connection details

A connection from Airflow to Azure data factory requires the following information:

  • Subscription ID
  • Data factory name
  • Resource group name
  • Application Client ID
  • Tenant ID
  • Client secret
  1. In your Azure portal, navigate to Synapse Analytics and select your subscription. Then, click on your Synapse workspace.
  2. Follow Azure documentation to retrieve your SQL Endpoint, SQL admin username, Name of your SQL pool, and your password.

Create your connection

  1. Open your Astro project and add the following line to your requirements.txt file:

    apache-airflow-providers-odbc
    

    This will install the ODBC provider package, which makes the ODBC connection type available in Airflow.

  2. Add the following OS-level packages to your packages.txt:

    
      gcc
      g++
      unixodbc
      unixodbc-dev
    
    

    These are required for installation of the ODBC driver in your docker containers.

  3. Append the following lines to your Dockerfile:

    
      USER root
      RUN curl https://packages.microsoft.com/keys/microsoft.asc | tee /etc/apt/trusted.gpg.d/microsoft.asc
    
      #Debian 11
      RUN curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list
    
      RUN apt-get update
      RUN ACCEPT_EULA=Y apt-get install -y msodbcsql18
      # optional: for bcp and sqlcmd
      RUN ACCEPT_EULA=Y apt-get install -y mssql-tools18
      # optional: for unixODBC development headers
      RUN apt-get install -y unixodbc-dev
      
      USER astro
      
    

    Refer to the Microsoft documentation for other OS types and versions.

  4. Run astro dev restart to restart your local Airflow environment and apply your changes in requirements.txt.

  5. In the Airflow UI for your local Airflow environment, go to Admin > Connections. Click + to add a new connection, then choose ODBC as the connection type.

  6. Fill out the following connection fields using the information you retrieved from Get connection details:

    • Connection Id: Enter a name for the connection.
    • Host: Enter your SQL Endpoint.
    • Schema: Enter your SQL pool Name.
    • Login: Enter your SQL admin username
    • Password: Enter your password.
    • Port: Enter 1433 as port.
    • Extra: Enter the string {"Driver": "ODBC Driver 18 for SQL Server"}.
  7. Click Test. After the connection test succeeds, click Save.

How it works

Airflow uses PyODBC to connect to the ODBC data sources through the ODBCHook. You can directly use this hook to create your own custom hooks or operators.

References