Connecting to Oracle database from the Astronomer Airflow image

Hi, everyone,

My question is related to both running Airflow locally via the CLI as well as running Astronomer Enterprise on-prem. I believe I have a similar issue to Connect to remote database airflow.

We have an Oracle database hosted on an external Windows server and I have the credentials for a database user, the service name, and the host/port. The service name is also defined within a tnsnames.ora on the host that is initiating the Oracle connection.

Based on my understanding, cx_oracle has a dependency on Oracle client libraries (see cx_Oracle 8 Installation — cx_Oracle 8.3.0 documentation). These libraries are typically provided via an Oracle client (full or instant client both seem to work).

I have successfully connected through cx_Oracle to our test database both from my Windows development environment as well as by using a pre-built instant client Docker image from Oracle (like Docker). The code from my Windows test is below.

import cx_Oracle

dsn_tns = cx_Oracle.makedsn(‘oracle-host’, ‘1521’, service_name=‘DBSERVICE.WORLD’)
conn = cx_Oracle.connect(dsn=dsn_tns, user=‘username’, password=‘password’)
data_cursor = conn.cursor()
data_cursor.execute(‘select * from schema.table where value='something'’)
for row in data_cursor:
print(‘{0} - {1}’.format(row[0], row[1]))
conn.close()

What is the recommended approach to do something similar with the Airflow image provided by Astronomer (either using the astro cli or using Astronomer Enterprise)? The Oracle instant client does not appear to be available to install via apt-get which I know the CLI uses when you add an OS package to packages.txt. I was able to connect from a bash session within the astro cli scheduler container by doing the following:

  1. Download the Oracle client rpm files directly from Oracle
  2. Copied to the running scheduler container.
  3. Converted and installed the rpms with alien.
  4. Copied my tnsnames.ora to the scheduler container and created the TNS_ADMIN environment variable with the path to the tnsnames.ora.
  5. Opened a sqlplus instance, connected using the credentials.

The ultimate goal is to be able to interact with our Oracle databases from within Airflow either using a PythonOperator or the oracle_hook/oracle_operator.

Thanks to anyone who has a suggestion.

I think this could be done with docker multistage build. You build one image with your oracle libs and then copy them to the astronomer image (in a similar manner to https://www.astronomer.io/docs/build-from-private-repo/ ; use COPY --from)

I’m working on a Dockerfile now. If I succeed I’ll post the results!

I got this all working so I wanted to share my work for anyone else who needs this in the future. The Dockerfile builds from the 1.10.10 Debian base, sets up a TNS_ADMIN env variable so that the Oracle client can find the tnsnames.ora file we use, installs alien and other needed libraries, runs alien to convert the rpms and install them, makes a change to oracle.conf, and finally installs cx_Oracle (since I was having certificate issues). The Oracle client rpms were downloaded directly from Oracle and included in the image.

If anyone feels so inclined, feel free to critique my Dockerfile as I am new to this.

FROM astronomerinc/ap-airflow:1.10.10-buster-onbuild

USER root

Set up the path to the tnsnames.ora file and copy it to the folder

ARG TNS_ADMIN=/tns_admin

ENV TNS_ADMIN=${TNS_ADMIN}

COPY tnsnames/tnsnames.ora /tns_admin/tnsnames.ora

RUN apt-get update

RUN apt-get -y install alien

RUN apt-get install libaio1 libaio-dev

Use alien to convert and install the Oracle client packages

RUN sudo alien oracleclientrpm/*.rpm -i

Corrects an issue running sqlplus

RUN echo /usr/lib/oracle/18.5/client64/lib > /etc/ld.so.conf.d/oracle.conf

RUN sudo ldconfig

RUN pip install cx_Oracle-7.3.0.tar.gz

USER astro

1 Like