I have this problem that I can't solve. We have table in Teradata database. Let's say that our table has only one row (Świdnica - Polish word) and one column (addresses).
Column 'addresses' is defined as:
addresses VARCHAR(80) CHARACTER SET LATIN NOT CASESPECIFIC
Let's say that we query the data:
select addresses from WH_table
and in Teradata Studio we get nice result:
ŚWIDNICA
but when I try to query using extension to SQLAlchemy named teradatasqlalchemy I get:
'\x8cWIDNICA'
My code looks like this:
import sqlalchemy as sqlal
import pandas as pd
engine = sqlal.create_engine('teradatasql://ip_to_my_db/?user=user&password=pass!&tmode=TERA', pool_size=5)
conn = engine.connect()
try:
query = """select addresses from WH_table"""
data = pd.read_sql(query, con = conn)
print(data)
except Exception as e:
print(e)
As far as I know teradatasqlalchemy query returns data as UTF-8. I tried encoding and decoding data using UTF8, Latin1, cp1252 and iso-8859-1, but I still can't get it to just return 'ŚWIDNICA'. I could replace \x8c to Ś, but probably as you would expect. It will work for a couple of examples and not for whole table which has more then thousand rows.
Using the answers from comment under the original question I figured out a way to make it work. It's not the most optimized answer, but works for small dataframes.
import sqlalchemy as sqlal
import pandas as pd
engine = sqlal.create_engine('teradatasql://ip_to_my_db/?user=user&password=pass!&tmode=TERA', pool_size=5)
conn = engine.connect()
try:
query = """select addresses from WH_table"""
data = pd.read_sql(query, con = conn) # read data from db using query
for column, series in data.items():
for item in series:
# change string to hex
str_to_hex = ''.join(hex(ord(x))[2:] for x in item)
# change hex to bytes and decode using encoding of your choice
hex_to_word = bytes.fromhex(str_to_hex).decode('cp1250')
# replace the word in df
series.replace(item, hex_to_word, inplace = True)
except Exception as e:
print(e)
It works well for small df, but going through each item in df is not very efficient as you can imagine.