pythonpandasdataframesqlalchemypython-oracledb

Python using oracledb to connect to Oracle database with Pandas DataFrame Error: "pandas only supports SQLAlchemy connectable (engine/connection)"


I am pretty new to Python and even newer to Pandas and am hoping for some guidance

My company has an on-prem DEV Oracle database that I am trying to connect to using Python & Pandas. After some searching I found that the Python package "oracledb" was recommended to be used.

Using VS code .IPYNB I have the following chunks of code that seem to work with no errors

# python -m pip install --upgrade pandas
import oracledb
import pandas as pd
from sqlalchemy import create_engine

connection = oracledb.connect(user="TEST", password="TESTING", dsn="TESTDB:1234/TEST")

print("Connected")
print(connection)

The above code seems to run just fine which is great

I then run the below code as a quick test

cursor=connection.cursor()
query_test='select * from dm_cnf_date where rownum < 2'

for row in cursor.execute(query_test):
    print(row)

This returns a tuple with a row of data so far so good, looks like I can connect to the database and run a query.

Next I wanted to get the data into a Pandas dataframe and this is where I got stuck

I tried this code

df = pd.read_sql(sql=query_test, con=connection)

Which then I get hit with the following error

:1: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. df = pd.read_sql(sql=query_test, con=connection)

I was loosely trying to follow this article ("Read data as pandas DataFrame"): https://kontext.tech/article/1019/python-read-data-from-oracle-database

but it didnt seem to work.

I tried to take a look at the sqlalchemy site here: https://docs.sqlalchemy.org/en/20/dialects/oracle.html#module-sqlalchemy.dialects.oracle.oracledb

Which I tried to rewrite my code a bit as follows

conn_url="oracle+oracledb://TEST:TESTING@TESTDB:1234/TEST"
engine=create_engine(conn_url)

df = pd.read_sql(sql=query_test, con=engine)

And I get hit with another error

OperationalError: DPY-6003: SID "TEST" is not registered with the listener at host "TESTDB" port 1234. (Similar to ORA-12505)

Just looking to connect to an Oracle DB and grab data into a Pandas dataframe but keep hitting a wall

Any insight would be very much appreciated


Solution

  • Try either of:

    #oracledb.defaults.arraysize = 1000 # Tune for big queries
    
    with oracledb.connect(user=un, password=pw, dsn=cs) as connection:
        with connection.cursor() as cursor:
           cursor.execute("select * from emp")
           col_names = [c.name for c in cursor.description]
           data = cursor.fetchall()
           df = pandas.DataFrame(data, columns=col_names)
           print(df)
    

    or

    engine = sqlalchemy.create_engine(
        'oracle+oracledb://',
        thick_mode=None,
        connect_args={
            "user": un,
            "password": pw,
            "dsn": cs
        }
        #, echo=True  # SQLAlchemy tracing / logging / debugging
        )
    
    emp_sql = "select * from emp"
    df_emp = pd.read_sql(emp_sql, engine)
    print(df_emp)
    

    The latter is slower. You can enable the tracing and see the number of queries that are executed internally by SQLAlchemy.

    Regarding the ORA-12505 error using:

    "oracle+oracledb://TEST:TESTING@TESTDB:1234/TEST"
    

    you probably needed to use:

    "oracle+oracledb://TEST:TESTING@TESTDB:1234?service_name=TEST"
    

    see https://docs.sqlalchemy.org/en/20/dialects/oracle.html#dialect-oracle-oracledb-connect

    Back in the old days, 'system identifiers' (SIDs) were commonly used for connecting to Oracle DB. SQLAlchemy defaults to use these in its syntax. However most DBs are now connected to via a 'service name', hence the need for the extra connection keyword. See the note in the SQLAlchemy doc:

    Note that although the SQLAlchemy URL syntax hostname:port/dbname looks like Oracle’s Easy Connect syntax, it is different. SQLAlchemy’s URL requires a system identifier (SID) for the dbname component

    You should continue using your 'service name' TEST, and not try to find what SID to use.