I have a Dataframe I wish to write a few rows of into a file and logger in Python 2.7. print(dataframe.iloc[0:4])
outputs a nice grid of the column headers and top 4 rows in the dataframe. However logging.info(dataframe.iloc[0:4])
throws:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 87: ordinal not in range(128)
Here is the output to console, works either directly to console or via print()
(note the ²
):
In[89]: d.iloc[0:4] OR print(d.iloc[0:4])
Out[89]:
ISO ID_0 NAME_0 ID_1 NAME_1 ID_2 NAME_2 Area(km.²) Pop2001_Cen Pop2010_Cen HHold2010 Hhold_Size
0 ARG 12 Argentina 2 Ciudad Autónoma de Buenos Aires NaN NaN 203.0 2776138.0 2890151 1150134.0 2.512882
1 ARG 12 Argentina 2 Ciudad Autónoma de Buenos Aires 2001.0 Comuna 1 NaN 171975.0 205886 84468.0 2.437444
2 ARG 12 Argentina 2 Ciudad Autónoma de Buenos Aires 2002.0 Comuna 2 NaN 165494.0 157932 73156.0 2.158839
3 ARG 12 Argentina 2 Ciudad Autónoma de Buenos Aires 2003.0 Comuna 3 NaN 184015.0 187537 80489.0 2.329971
As does file.write(dataframe.iloc[0:4])
and so on, as one of the column headers includes a non-ascii character. I have tried all sorts of variations of decode()
, encode()
, etc, but cannot avoid this error.
print(d.iloc[0:4])
works, so another approach was to use print(d.iloc[0:4], file=f)
but even with from __future__ import print_function
I get the above ascii encoding error.
Other ways to replicate this problem are logging.info('Area(km.²)')
or 'Area(km.²)'.decode()
How can I render this dataframe?
[Edit:]
I also want to understand fundamentally how I deal with string encoding/decoding in Python 2.7 - I've been hacking away at this for more time than it deserves because this isn't the only time I've had this UnicodeDecodeError
error, and I don't know when it'll occur and I am still just throwing fixes at the console to see what sticks, without any underlying understanding of what's going on.
IIUC, you can try to pass encoding='utf-8'
when writing out the first n rows of the dataframe with:
df.head(n).to_csv('yourfileout.csv', encoding='utf-8')