I am reading a bunch of strings from mysql database using python, and after some processing, writing them to a CSV file. However I see some totally junk characters appearing in the csv file. For example when I open the csv using gvim, I see characters like <92>
,<89>
, <94>
etc.
Any thoughts? I tried doing string.encode('utf-8') before writing to csv but that gave an error that UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 905: ordinal not in range(128)
I eventually solved it. I was using MySQLdb python module to connect to mysql. I just used charset=utf8
and use_unicode = True
while creating a database connection with it. Further, I changed the MySQL table's collation to utf8_unicode_ci
. Finally when writing my string to csv file, I used:
file_pointer.write(my_string.encode('ascii', 'ignore'))
I don't know how sound the logic is, but this is what I unearthed after several hours of googling, and it seems to work for me.