Hi, everyone,
My question is related to both running Airflow locally via the CLI as well as running Astronomer Enterprise on-prem. I believe I have a similar issue to Connect to remote database airflow.
We have an Oracle database hosted on an external Windows server and I have the credentials for a database user, the service name, and the host/port. The service name is also defined within a tnsnames.ora on the host that is initiating the Oracle connection.
Based on my understanding, cx_oracle has a dependency on Oracle client libraries (see cx_Oracle 8 Installation — cx_Oracle 8.3.0 documentation). These libraries are typically provided via an Oracle client (full or instant client both seem to work).
I have successfully connected through cx_Oracle to our test database both from my Windows development environment as well as by using a pre-built instant client Docker image from Oracle (like Docker). The code from my Windows test is below.
import cx_Oracle
dsn_tns = cx_Oracle.makedsn(‘oracle-host’, ‘1521’, service_name=‘DBSERVICE.WORLD’)
conn = cx_Oracle.connect(dsn=dsn_tns, user=‘username’, password=‘password’)
data_cursor = conn.cursor()
data_cursor.execute(‘select * from schema.table where value='something'’)
for row in data_cursor:
print(‘{0} - {1}’.format(row[0], row[1]))
conn.close()
What is the recommended approach to do something similar with the Airflow image provided by Astronomer (either using the astro cli or using Astronomer Enterprise)? The Oracle instant client does not appear to be available to install via apt-get which I know the CLI uses when you add an OS package to packages.txt. I was able to connect from a bash session within the astro cli scheduler container by doing the following:
- Download the Oracle client rpm files directly from Oracle
- Copied to the running scheduler container.
- Converted and installed the rpms with alien.
- Copied my tnsnames.ora to the scheduler container and created the TNS_ADMIN environment variable with the path to the tnsnames.ora.
- Opened a sqlplus instance, connected using the credentials.
The ultimate goal is to be able to interact with our Oracle databases from within Airflow either using a PythonOperator or the oracle_hook/oracle_operator.
Thanks to anyone who has a suggestion.