pythonwindowspowershellencodingutf-8

How do I get UTF-8 to work flawlessly in modern PowerShell on Windows?


I have a C++ program which outputs raw UTF-8 and works flawlessly on Linux, but on Windows shells the output is not as nice. "®" turns into "┬«", "©" turns into "┬⌐", for example. There is also a Python part to the code, which seems to work better when printing to the shell, so I tried to test Python output a bit.

PS C:\Users\user> python -c 'print("\N{GREEK CAPITAL LETTER DELTA}")' > test_file_python.txt
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0394' in position 0: character maps to <undefined>
PS C:\Users\user> python -X utf8 -c 'print("\N{GREEK CAPITAL LETTER DELTA}")' > test_file_python.txt
PS C:\Users\user> cat test_file_python.txt
Δ
PS C:\Users\user> python -c 'print("\N{GREEK CAPITAL LETTER DELTA}")'
Δ
PS C:\Users\user> cat .\test_file_python_wsl.txt  # Generated in WSL with the above commands
Δ
PS C:\Users\user> Format-Hex .\test_file_python.txt

   Label: C:\Users\user\test_file_python.txt

          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000000 E2 95 AC C3 B6 0D 0A                            �ö��

PS C:\Users\user> Format-Hex .\test_file_python_wsl.txt

   Label: C:\Users\user\test_file_python_wsl.txt

          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000000 CE 94 0A                                        ��

I do not understand how PowerShell works with encoding, how can Python do this right when writing to the shell but not when redirecting, and why something that works perfectly in Linux Bash in WSL has this sort of issues in the newer cross-platform PowerShell Core which should "just work". These are multiple questions, but probably have a common answer.

EDIT: I forgot to add some important information, I am using PowerShell Core v7.3.6 with this encoding settings:

PS C:\Users\user> $OutputEncoding

Preamble          :
BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : True
CodePage          : 65001

Solution

  • On Windows, there are two pieces to the puzzle:


    An alternative via a one-time configuration step is to switch your machine to use UTF-8 system-wide, in which case the above steps aren't necessary; however, this has far-reaching consequences and can break legacy scripts and applications - see this answer.


    Background information:

    PowerShell is partly a good Windows console citizen:

    Python exhibits nonstandard behavior: