I'm connecting to a Oracle database using the following code:
jar = ojdbc8.jar path
jvm_path = jvm.dll path
args = '-Djava.class.path=%s' % jar
jpype.startJVM(jvm_path, args)
con = jaydebeapi.connect("oracle.jdbc.driver.OracleDriver", url,[user, password], jar)
The connection works fine, however the data is returned in this odd format.
pd.read_sql("SELECT * FROM table1", con)
yields
+---+-----------------+-----------------+-----------------+
| | (C,O,L,U,M,N,1) | (C,O,L,U,M,N,2) | (C,O,L,U,M,N,3) |
+---+-----------------+-----------------+-----------------+
| 1 | (t,e,s,t) | (t,e,s,t,2) | 1 |
+---+-----------------+-----------------+-----------------+
| 2 | (f,o,o) | (b,a,r) | 100 |
+---+-----------------+-----------------+-----------------+
The number and dates are imported correctly, but not the varchar
columns. I tried different tables and all of them have this problem.
I haven't seen anything like that anywhere. Hope you can help me.
This seems to be a problem when using jaydebeapi
with jpype
. I can reproduce this when connecting to a Oracle db in the same way that you do (in my case Oracle 11gR2, but since you are using ojdbc8.jar
, I guess it also happens with other versions).
There are different ways you can solve this:
Since the error only seems to occur in a specific combination of packages, the most sensible thing to do is to try and avoid these and thus the error altogether.
Alternative 1: Use jaydebeapi
without jpype
:
As noted, I only observe this when using jaydebeapi
with jpype
. However, in my case, jpype
is not needed at all. I have the .jar
file locally and my connection works fine without it:
import jaydebeapi as jdba
import pandas as pd
import os
db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'
jar=os.getcwd()+'/ojdbc6.jar'
conn = jdba.connect('oracle.jdbc.driver.OracleDriver',
'jdbc:oracle:thin:@' + db_host + ':' + str(db_port) + ':' + db_sid,
{'user': 'USERNAME', 'password': 'PASSWORD'},
jar
)
df_jay = pd.read_sql('SELECT * FROM YOURSID.table1', conn)
conn.close()
In my case, this works fine and creates the dataframes normally.
Alternative 2: Use cx_Oracle
instead:
The issue also does not occur if I use cx_Oracle
to connect to the Oracle db:
import cx_Oracle
import pandas as pd
import os
db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'
dsn_tns = cx_Oracle.makedsn(db_host, db_port, db_sid)
cx_conn = cx_Oracle.connect('USERNAME', 'PASSWORD', dsn_tns)
df_cxo = pd.read_sql('SELECT * FROM YOURSID.table1', con=cx_conn)
cx_conn.close()
Note: For cx_Oracle
to work you have to have the Oracle Instant Client installed and properly set up (see e.g. cx_Oracle documentation for Ubuntu).
If for some reason, you cannot use the above connection alternatives, you can also transform your dataframe.
Alternative 3: join tuple entries:
You can use ''.join()
to convert tuples to strings. You need to do this for the entries and the column names.
# for all entries that are not None, join the tuples
for col in df.select_dtypes(include=['object']).columns:
df[col] = df[col].apply(lambda x: ''.join(x) if x is not None else x)
# also rename the column headings in the same way
df.rename(columns=lambda x: ''.join(x) if x is not None else x, inplace=True)
Alternative 4: change dtype of columns:
By changnig the dtype
of an affected column from object
to string
, all entries will also be converted. Note that this may have unwanted side-effects, like e.g. changing None
values to the string <N/A>
. Also, you will have to rename the column headings separately, as above.
for col in df.select_dtypes(include=['object']).columns:
df[col] = df[col].astype('string')
# again, rename headings
df.rename(columns=lambda x: ''.join(x) if x is not None else x, inplace=True)
All of these should yield more or less the same df
in the end (apart from the dtypes
and possible replacement of None
values):
+---+---------+---------+---------+
| | COLUMN1 | COLUMN2 | COLUMN3 |
+---+---------+---------+---------+
| 1 | test | test2 | 1 |
+---+---------+---------+---------+
| 2 | foo | bar | 100 |
+---+---------+---------+---------+