pythonencodingfileutf-8

How to convert a file to utf-8 in Python?


I need to convert some files to UTF-8 in Python, and I have trouble with converting the file I'd like to do the equivalent of:

iconv -t utf-8 $file > converted/$file # this is shell code

Thanks!


Solution

  • You can use the codecs module, like this:

    import codecs
    BLOCKSIZE = 1048576 # or some other, desired size in bytes
    with codecs.open(sourceFileName, "r", "your-source-encoding") as sourceFile:
        with codecs.open(targetFileName, "w", "utf-8") as targetFile:
            while True:
                contents = sourceFile.read(BLOCKSIZE)
                if not contents:
                    break
                targetFile.write(contents)
    

    EDIT: added BLOCKSIZE parameter to control file chunk size.