I'm loading some redshift data in pandas via pg8000:
>>>> df = pd.read_sql(sql_text, pg8000_cnx)
pg8000_cnx
is a connection object to a redshift database. I always get columns that are byte quoted, ex:
>>> df.columns
Index([b'column_',
b'column_2',
...
b'column_n'],
dtype='object', length=142)
I know I can use .decode('utf-8')
and some list comprehension to convert the columns back to string, but I was trying to have them loaded in string format from redshift itself. Tried playing with different CLIENT_ENCODING
settings with the pg8000 cursor, but still get byte quoted columns in the pandas df.
import pg8000
conn = pg8000.connect(**db_details)
cursor = conn.cursor()
cursor.execute("SET CLIENT_ENCODING TO 'UTF8'")
cursor.execute(sql)
cursor.fetchone()
# load df here
Is there some redshift or pg8000 option I am missing that would load the column headers in utf-8 decoded strings from the db connection object itself?
This is fixed in pg8000 1.16.6, so upgrading to the latest version should solve the problem.
So from verion 1.16.6 pg8000 returns column names as str
rather than bytes
.