I saw similar questions but answers to them didn't help. This code:
with codecs.open( sourceFileName, "r", sourceEncoding, ) as sourceFile:
contents = sourceFile.read()
with codecs.open( sourceFileName, "w", "utf-8") as targetFile:
if contents:
targetFile.write(contents)
return an error "UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 1: character maps to undefined"
This code:
with open(sourceFileName, "rb") as sourceFileBin:
contents = sourceFileBin.read().decode(sourceEncoding)
with open(sourceFileName, "wb") as targetFile:
targetFile.write( contents.encode("unt-8"))
produces the same error. The troublesome symbol is cyrillic letter 'И' (which as far as I know is represented by '0xc8' not '0x98'). I'm using python 2.7 on windows.
UPD: It turns out, original file encoding might not be cp1251, these error could be the result of a bug in a text editor. However, all my texteditors can read this file correctly. Then I'm looking for some workaround, because files without this particular letter are converted correctly.
I found out that due some kind of bug (or just my stupidity) I was trying to convert already converted file.
I'm very sorry for wasting your time