[SOLVED] How to avoid byte quoted column headers when reading redshift data with pg8000

How to avoid byte quoted column headers when reading redshift data with pg8000

I'm loading some redshift data in pandas via pg8000:

>>>> df = pd.read_sql(sql_text, pg8000_cnx)

pg8000_cnx is a connection object to a redshift database. I always get columns that are byte quoted, ex:

>>> df.columns
Index([b'column_',
       b'column_2',
       ...
       b'column_n'],
       dtype='object', length=142)

I know I can use .decode('utf-8') and some list comprehension to convert the columns back to string, but I was trying to have them loaded in string format from redshift itself. Tried playing with different CLIENT_ENCODING settings with the pg8000 cursor, but still get byte quoted columns in the pandas df.

import pg8000
conn = pg8000.connect(**db_details)
cursor = conn.cursor()
cursor.execute("SET CLIENT_ENCODING TO 'UTF8'")
cursor.execute(sql)
cursor.fetchone()
# load df here

Is there some redshift or pg8000 option I am missing that would load the column headers in utf-8 decoded strings from the db connection object itself?

Solution

This is fixed in pg8000 1.16.6, so upgrading to the latest version should solve the problem.

So from verion 1.16.6 pg8000 returns column names as str rather than bytes.