pythonwindowscharacter-encodingcommand-prompt

Wrong encoding with Python decode in Windows command prompt


When I run decode on a byte string encoded as UTF-8 I get ANSI encoding in a Windows command prompt.

>python --version
Python 3.13.0
>python -c "print(b'\xc3\x96'.decode('utf-8'))" > test.txt

When I open test.txt in Notepad++ it says that the encoding is ANSI. If I run the same command in MSYS2 (using Python 3.11.6) the resulting encoding is UTF-8 as expected. How come the encoding is wrong using the Windows command prompt?


Solution

  • When you .decode() you generate a Unicode string (codepoints without encoding). It's no different than writing:

    python -c "print('Ö')" > test.txt
    

    print then writes that Unicode string to stdout in an OS-dependent way.

    For example, on Windows when redirected to a file it uses the default "ANSI" encoding of that localized Windows version (Windows-1252 encoding on US and Western European Windows versions).

    Using UTF-8 Mode overrides this with either the -X utf8 Python option or setting the environment variable PYTHONUTF8=1:

    python -X utf8 -c "print('Ö')" > test.txt
    

    The environment variable PYTHONIOENCODING can also be used to directly override the encoding of stdin/stdout/stderr when redirecting Python I/O.