I am trying to read data from a local oracle database (Oracle Database 11g Release 11.2.0.4.0 - 64bit Production).
I have created my own docker image:
# Use the base Apache Airflow image
FROM apache/airflow:2.8.3
# Set environment variables for the Oracle Instant Client
ENV ORACLE_HOME=/opt/oracle
ENV LD_LIBRARY_PATH=$ORACLE_HOME/lib
# Switch to the root user temporarily
USER root
# Install required packages
RUN apt-get update && apt-get install -y \
unzip \
libaio1 \
&& rm -rf /var/lib/apt/lists/*
# Create the Oracle directory with proper permissions
RUN mkdir -p $ORACLE_HOME && chown -R airflow: $ORACLE_HOME
# Create the tmp directory with proper permissions
RUN mkdir -p /tmp && chown -R airflow: /tmp
# Switch back to a non-root user (e.g., "airflow")
USER airflow
# Install Python packages required for Oracle client and Airflow Oracle provider
RUN pip install oracledb apache-airflow-providers-oracle pandas sqlalchemy
# Download and install the Oracle Instant Client
# https://www.oracle.com/de/database/technologies/instant-client/linux-x86-64-downloads.html
COPY instantclient-basic-linux.x64-19.22.0.0.0dbru.zip /tmp/
COPY instantclient-sdk-linux.x64-19.22.0.0.0dbru.zip /tmp/
RUN unzip /tmp/instantclient-basic-linux.x64-19.22.0.0.0dbru.zip -d $ORACLE_HOME && \
unzip -o /tmp/instantclient-sdk-linux.x64-19.22.0.0.0dbru.zip -d $ORACLE_HOME
# Delete ZIP Files
RUN rm -rf /tmp/instantclient-basic-linux.x64-19.22.0.0.0dbru.zip /tmp/instantclient-sdk-linux.x64-19.22.0.0.0dbru.zip
# Set environment variables for Oracle Instant Client
ENV PATH=$PATH:$ORACLE_HOME
ENV ORACLE_VERSION=19.22
In short: oracledb apache-airflow-providers-oracle pandas sqlalchemy as well as instantclient-basic-linux.x64-19.22.0.0.0dbru.zip from oracle.com was added to the standard image.
docker compose build works, docker compose init works, docker compose up works without errors.
Oracle shows up as connection type and I have created a connection:
My test DAG looks like this:
from datetime import datetime
from airflow import DAG
from airflow.decorators import task
from airflow.providers.oracle.hooks.oracle import OracleHook
table_name = "DB.TABLE"
@task
def get_data_from_oracle():
oracle_hook=OracleHook(oracle_conn_id='Oracle_***')
data = oracle_hook.get_pandas_df(sql=f"SELECT COUNT(1) FROM {table_name}")
return data.to_dict()
with DAG('oracle_test', start_date=datetime(2022, 1, 1), schedule_interval='@once') as dag:
data = get_data_from_oracle()
If I run said DAG I get errors which I can not resolve. It runs for around a minute and terminates with this errors: (I assume time out)
[2024-03-21T10:27:31.754+0000] {standard_task_runner.py:107} ERROR - Failed to execute job 8 for task get_data_from_oracle (DPY-6005: cannot connect to database (CONNECTION_ID=P26GiMCTIFWWPDhhkB4vZQ==). DPY-4011: the database or network closed the connection [Errno 104] Connection reset by peer Help: https://python-oracledb.readthedocs.io/en/latest/user_guide/troubleshooting.html#dpy-4011; 534)
Or, depending on configuration: (no time out)
[2024-03-21, 16:57:08 CET] {standard_task_runner.py:107} ERROR - Failed to execute job 16 for task get_data_from_oracle (DPY-6005: cannot connect to database (CONNECTION_ID=Ma2uz0YYUUhkdOMBsNNS0w==). DPY-6003: SID "**" is not registered with the listener at host "10..." port 1524. (Similar to ORA-12505); 5827)
I double checked ip, ping, SID (via Query) and so on.
I assume I do something horribly wrong in either
Either way internet research lead to no solution. If anyone can provide me with tips or a working way to connect to an Oracle DB I'd be grateful. Please don't make me go back to SSIS.
Update: The same query towards a DB with Oracle Database 12c Release 12.1.0.1.0 - 64bit Production works without any problems. It seems to be an issue with version / drivers.
I finally figured it out: A connection with the right parameter (I hade a mistake there because copy paste from tnsping seems to give 2 hosts by default) gives you this error:
DPY-3010: connections to this database server version are not supported by python-oracledb in thin mode
Oracle States:
For example, Oracle Call Interface 21 can connect to Oracle Database 12.1 or later, while Oracle Call Interface 19.3 can connect to Oracle Database 11.2 or later. Some tools may have other restrictions.
Source: https://www.oracle.com/de/database/technologies/instant-client/linux-x86-64-downloads.html
The problem with downgrading is that you might run into compatibility problems with newer versions of oracle or airflow. So the work around for that is using "thick mode" which supports older versions of oracle.
With help of Error DPY-3010 when connecting python-oracledb to Oracle DB 11.2 you get to https://airflow.apache.org/docs/apache-airflow-providers-oracle/stable/connections/oracle.html which explains how to access an oracle db via said "thick mode" in a DAG:
def get_data_from_oracle():
oracle_hook=OracleHook(oracle_conn_id='Connection_Name', thick_mode=True)
data = oracle_hook.get_pandas_df(sql=f"SELECT COUNT(1) FROM {table_name}")
return data.to_dict()
Next you need to make sure that you set up the oracle instant client correctly. The oracle Zip (instantclient-basic-linux.x64-19.22.0.0.0dbru.zip) contains a subfolder with the version name. If you copy its contents to $ORACLE_HOME airflow / python / oracle will not be able to find the files in the expected folder.
DPI-1047: Cannot locate a 64-bit Oracle Client library: "/opt/oracle/lib/libclntsh.so: cannot open shared object file: No such file or directory".
I perma fixed this with unzip -j (flatten folders in ZIP) in my dockerfile:
# Use the base Apache Airflow image
FROM apache/airflow:2.8.3
# Set environment variables for the Oracle Instant Client
ENV ORACLE_HOME=/opt/oracle
ENV LD_LIBRARY_PATH=$ORACLE_HOME
# Switch to the root user temporarily
USER root
# Install required packages
RUN apt-get update && apt-get install -y \
unzip \
libaio1 \
&& rm -rf /var/lib/apt/lists/*
# Create the Oracle directory with proper permissions
RUN mkdir -p $ORACLE_HOME && chown -R airflow: $ORACLE_HOME
# Create the tmp directory with proper permissions
RUN mkdir -p /tmp && chown -R airflow: /tmp
# Switch back to a non-root user (e.g., "airflow")
USER airflow
# Install Python packages required for Oracle client and Airflow Oracle provider
RUN pip install cx_Oracle oracledb apache-airflow-providers-oracle pandas sqlalchemy
# Download and install the Oracle Instant Client
# https://www.oracle.com/de/database/technologies/instant-client/linux-x86-64-downloads.html
COPY instantclient-basic-linux.x64-19.22.0.0.0dbru.zip /tmp/
RUN unzip -j /tmp/instantclient-basic-linux.x64-19.22.0.0.0dbru.zip -d $ORACLE_HOME
# Delete ZIP Files
RUN rm -rf /tmp/instantclient-basic-linux.x64-19.22.0.0.0dbru.zip
# Set environment variables for Oracle Instant Client
ENV PATH=$PATH:$ORACLE_HOME
ENV ORACLE_VERSION=19.22
With this in place connections to "old" (pre 12.1) systems work as long as you enable thick mode.
Next steps:
P.s. thank you https://stackoverflow.com/users/610979/frank-schmitt for the comments!