When I run decode on a byte string encoded as UTF-8 I get ANSI encoding in a Windows command prompt.
>python --version
Python 3.13.0
>python -c "print(b'\xc3\x96'.decode('utf-8'))" > test.txt
When I open test.txt in Notepad++ it says that the encoding is ANSI. If I run the same command in MSYS2 (using Python 3.11.6) the resulting encoding is UTF-8 as expected. How come the encoding is wrong using the Windows command prompt?
When you .decode()
you generate a Unicode string (codepoints without encoding). It's no different than writing:
python -c "print('Ö')" > test.txt
print
then writes that Unicode string to stdout
in an OS-dependent way.
For example, on Windows when redirected to a file it uses the default "ANSI" encoding of that localized Windows version (Windows-1252 encoding on US and Western European Windows versions).
Using UTF-8 Mode overrides this with either the -X utf8
Python option or setting the environment variable PYTHONUTF8=1
:
python -X utf8 -c "print('Ö')" > test.txt
The environment variable PYTHONIOENCODING
can also be used to directly override the encoding of stdin/stdout/stderr when redirecting Python I/O.