pythonpandasdataframeunicode

Write or log print output of pandas Dataframe


I have a Dataframe I wish to write a few rows of into a file and logger in Python 2.7. print(dataframe.iloc[0:4]) outputs a nice grid of the column headers and top 4 rows in the dataframe. However logging.info(dataframe.iloc[0:4]) throws:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 87: ordinal not in range(128)

Here is the output to console, works either directly to console or via print() (note the ²):

In[89]: d.iloc[0:4]    OR   print(d.iloc[0:4])
Out[89]: 
   ISO  ID_0     NAME_0  ID_1                           NAME_1    ID_2    NAME_2  Area(km.²)  Pop2001_Cen  Pop2010_Cen  HHold2010  Hhold_Size
0  ARG    12  Argentina     2  Ciudad Autónoma de Buenos Aires     NaN       NaN       203.0    2776138.0      2890151  1150134.0    2.512882
1  ARG    12  Argentina     2  Ciudad Autónoma de Buenos Aires  2001.0  Comuna 1         NaN     171975.0       205886    84468.0    2.437444
2  ARG    12  Argentina     2  Ciudad Autónoma de Buenos Aires  2002.0  Comuna 2         NaN     165494.0       157932    73156.0    2.158839
3  ARG    12  Argentina     2  Ciudad Autónoma de Buenos Aires  2003.0  Comuna 3         NaN     184015.0       187537    80489.0    2.329971

As does file.write(dataframe.iloc[0:4]) and so on, as one of the column headers includes a non-ascii character. I have tried all sorts of variations of decode(), encode(), etc, but cannot avoid this error.

print(d.iloc[0:4]) works, so another approach was to use print(d.iloc[0:4], file=f) but even with from __future__ import print_function I get the above ascii encoding error.

Other ways to replicate this problem are logging.info('Area(km.²)') or 'Area(km.²)'.decode()

How can I render this dataframe?

[Edit:]

I also want to understand fundamentally how I deal with string encoding/decoding in Python 2.7 - I've been hacking away at this for more time than it deserves because this isn't the only time I've had this UnicodeDecodeError error, and I don't know when it'll occur and I am still just throwing fixes at the console to see what sticks, without any underlying understanding of what's going on.


Solution

  • IIUC, you can try to pass encoding='utf-8' when writing out the first n rows of the dataframe with:

    df.head(n).to_csv('yourfileout.csv', encoding='utf-8')