I am pretty new to Python and even newer to Pandas and am hoping for some guidance
My company has an on-prem DEV Oracle database that I am trying to connect to using Python & Pandas. After some searching I found that the Python package "oracledb" was recommended to be used.
Using VS code .IPYNB I have the following chunks of code that seem to work with no errors
# python -m pip install --upgrade pandas
import oracledb
import pandas as pd
from sqlalchemy import create_engine
connection = oracledb.connect(user="TEST", password="TESTING", dsn="TESTDB:1234/TEST")
print("Connected")
print(connection)
The above code seems to run just fine which is great
I then run the below code as a quick test
cursor=connection.cursor()
query_test='select * from dm_cnf_date where rownum < 2'
for row in cursor.execute(query_test):
print(row)
This returns a tuple with a row of data so far so good, looks like I can connect to the database and run a query.
Next I wanted to get the data into a Pandas dataframe and this is where I got stuck
I tried this code
df = pd.read_sql(sql=query_test, con=connection)
Which then I get hit with the following error
:1: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. df = pd.read_sql(sql=query_test, con=connection)
I was loosely trying to follow this article ("Read data as pandas DataFrame"): https://kontext.tech/article/1019/python-read-data-from-oracle-database
but it didnt seem to work.
I tried to take a look at the sqlalchemy site here: https://docs.sqlalchemy.org/en/20/dialects/oracle.html#module-sqlalchemy.dialects.oracle.oracledb
Which I tried to rewrite my code a bit as follows
conn_url="oracle+oracledb://TEST:TESTING@TESTDB:1234/TEST"
engine=create_engine(conn_url)
df = pd.read_sql(sql=query_test, con=engine)
And I get hit with another error
OperationalError: DPY-6003: SID "TEST" is not registered with the listener at host "TESTDB" port 1234. (Similar to ORA-12505)
Just looking to connect to an Oracle DB and grab data into a Pandas dataframe but keep hitting a wall
Any insight would be very much appreciated
Try either of:
#oracledb.defaults.arraysize = 1000 # Tune for big queries
with oracledb.connect(user=un, password=pw, dsn=cs) as connection:
with connection.cursor() as cursor:
cursor.execute("select * from emp")
col_names = [c.name for c in cursor.description]
data = cursor.fetchall()
df = pandas.DataFrame(data, columns=col_names)
print(df)
or
engine = sqlalchemy.create_engine(
'oracle+oracledb://',
thick_mode=None,
connect_args={
"user": un,
"password": pw,
"dsn": cs
}
#, echo=True # SQLAlchemy tracing / logging / debugging
)
emp_sql = "select * from emp"
df_emp = pd.read_sql(emp_sql, engine)
print(df_emp)
The latter is slower. You can enable the tracing and see the number of queries that are executed internally by SQLAlchemy.
Regarding the ORA-12505 error using:
"oracle+oracledb://TEST:TESTING@TESTDB:1234/TEST"
you probably needed to use:
"oracle+oracledb://TEST:TESTING@TESTDB:1234?service_name=TEST"
see https://docs.sqlalchemy.org/en/20/dialects/oracle.html#dialect-oracle-oracledb-connect
Back in the old days, 'system identifiers' (SIDs) were commonly used for connecting to Oracle DB. SQLAlchemy defaults to use these in its syntax. However most DBs are now connected to via a 'service name', hence the need for the extra connection keyword. See the note in the SQLAlchemy doc:
Note that although the SQLAlchemy URL syntax hostname:port/dbname looks like Oracle’s Easy Connect syntax, it is different. SQLAlchemy’s URL requires a system identifier (SID) for the dbname component
You should continue using your 'service name' TEST, and not try to find what SID to use.