pythonmysqlvimencodingsmart-quotes

Junk characters (smart quotes, etc.) in output file


I am reading a bunch of strings from mysql database using python, and after some processing, writing them to a CSV file. However I see some totally junk characters appearing in the csv file. For example when I open the csv using gvim, I see characters like <92>,<89>, <94> etc.

Any thoughts? I tried doing string.encode('utf-8') before writing to csv but that gave an error that UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 905: ordinal not in range(128)


Solution

  • I eventually solved it. I was using MySQLdb python module to connect to mysql. I just used charset=utf8 and use_unicode = True while creating a database connection with it. Further, I changed the MySQL table's collation to utf8_unicode_ci. Finally when writing my string to csv file, I used:

    file_pointer.write(my_string.encode('ascii', 'ignore'))
    

    I don't know how sound the logic is, but this is what I unearthed after several hours of googling, and it seems to work for me.