I'm trying to take data from an Oracle db, just to have it on a csv file. When i try to do it with the default encoding i get the error of: 'utf-8' codec can't decode byte 0xd1 in position 14: invalid continuation byte.. The character set from the db is AL32UTF8 and for the national is AL16UTF8. Some clients have put records on spanish, so some charactes ñ are replaced by and ?. I've tried to change the encode to ISO-8859-1 and that works for the first 2 querys, but to the others i have the partial multibyte character error. I also check the data with the query and there's others ? replacements.
Here is my code (please ignore the credentials and the querys i just put some random values):
os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.AL32UTF8'
connection = None
cursor = None
dir = './csv_files/'
queries = {
'file_name.csv': "SELECT * table",
'file_name.csv': "SELECT * FROM table",
}
if not os.path.exists(dir):
os.makedirs(dir)
try:
dsn_tns = cx_Oracle.makedsn('171.11.1.111', '1521', sid='ñññ')
connection = cx_Oracle.connect(user, password, dsn=dsn_tns, encoding='ISO-8859-1')
cursor = connection.cursor()
for file_name, query in queries.items():
try:
cursor.execute(query)
rows = cursor.fetchall()
columns = [col[0] for col in cursor.description]
dataFrame = pd.DataFrame(rows, columns=columns)
file_path = os.path.join(dir, file_name)
dataFrame.to_csv(file_path, index=False, encoding='utf-16', errors= 'replace')
except Exception as e:
print(f"Error in query for {file_name}: {e}")
continue
except Exception as e:
print(f"Error in connection: {e}")
finally:
if cursor is not None and connection is not None:
cursor.close()
How can i handle this issue without deleting the conflict data from the db?
I've tried changing the encoding, but none of them works.
Putting the comment as an answer:
(1) Check for corrupt data or query raw data. Note that both of these identify data issues in your database that you should fix.
(2) Use python-oracledb which replaced cx_Oracle
(3) Skip the overhead of pandas and write data using cvs, see https://github.com/oracle/python-oracledb/blob/main/samples/write_csv.py
(4) Tune the arraysize when fetching (see previous example)