pythonencodingcp1251

Converting file from cp1251 to utf8


I saw similar questions but answers to them didn't help. This code:

with codecs.open( sourceFileName, "r",  sourceEncoding, ) as sourceFile:
    contents = sourceFile.read()

with codecs.open( sourceFileName, "w", "utf-8") as targetFile:
    if contents:
        targetFile.write(contents)

return an error "UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 1: character maps to undefined"

This code:

with open(sourceFileName, "rb") as sourceFileBin:
    contents = sourceFileBin.read().decode(sourceEncoding)

with open(sourceFileName, "wb") as targetFile:
    targetFile.write( contents.encode("unt-8"))

produces the same error. The troublesome symbol is cyrillic letter 'И' (which as far as I know is represented by '0xc8' not '0x98'). I'm using python 2.7 on windows.

UPD: It turns out, original file encoding might not be cp1251, these error could be the result of a bug in a text editor. However, all my texteditors can read this file correctly. Then I'm looking for some workaround, because files without this particular letter are converted correctly.


Solution

  • I found out that due some kind of bug (or just my stupidity) I was trying to convert already converted file.

    I'm very sorry for wasting your time