PyPy3 doesn't display non-ASCII UNICODE characters correctly.
A simple example, the following:
b'\xce\x9e\xce\xad\xce\xbd\xce\xb7 \xce\x93\xce\xae\xce\xb9\xce\xbd\xce\xbf\xcf\x82'.decode('utf8')
Should evaluate to my user name: 'Ξένη Γήινος'
.
But the output of PyPy3 is this:
In [1]: b'\xce\x9e\xce\xad\xce\xbd\xce\xb7 \xce\x93\xce\xae\xce\xb9\xce\xbd\xce\xbf\xcf\x82'.decode('utf8')
Out[1]: '╬×╬¡╬¢╬À ╬ô╬«╬╣╬¢╬┐¤é'
There is very little information on this I can find through Google searching, but I have found this:
https://github.com/pypy/pypy/issues/4948
So this is a known issue, and it hasn't been fixed.
I tried to fix this issue using information I found from the linked page, the encoding and locale used by CPython are:
In [1]: import locale, os, sys
In [2]: locale.getdefaultlocale()
<ipython-input-2-64720e52add3>:1: DeprecationWarning: 'locale.getdefaultlocale' is deprecated and slated for removal in Python 3.15. Use setlocale(), getencoding() and getlocale() instead.
locale.getdefaultlocale()
Out[2]: ('en_US', 'cp1252')
In [3]: locale.getlocale()
Out[3]: ('English_United States', '1252')
In [4]: locale.getencoding()
Out[4]: 'cp1252'
In [5]: locale.LC_ALL
Out[5]: 0
In [6]: sys.getdefaultencoding()
Out[6]: 'utf-8'
And these are what's used by PyPy3:
In [2]: import sys, locale
In [3]: locale.getencoding()
Out[3]: 'utf-8'
In [4]: locale.getlocale()
Out[4]: ('English_United States', '1252')
In [5]: sys.getdefaultencoding()
Out[5]: 'utf-8'
So it seems evident that the issue is caused by the mismatch between the code page Windows uses and the code page PyPy3 uses, Windows uses 'cp1252'
and CPython uses the same code page, but PyPy3 doesn't. Thus the fix is to either make PyPy3 use 'cp1252'
code page or make Windows console use 'utf-8'
.
I tried many ways to fix it, setting the environment variable doesn't work:
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" -Name "PYTHONIOENCODING" -Type STRING -Value "UTF-8"
The following also doesn't work:
chcp 65001
There is no reload
in Python 3 and therefore I cannot use sys.setdefaultencoding
, and locale.setlocale(0)
doesn't work either.
The above are just a few of the methods I have tried.
But the following works:
PS C:\Users\xenig> [Console]::InputEncoding = New-Object System.Text.UTF8Encoding
PS C:\Users\xenig> [Console]::OutputEncoding = New-Object System.Text.UTF8Encoding
PS C:\Users\xenig> D:\Programs\pypy3\Scripts\ipython.exe
Python 3.11.11 (0253c85bf5f8, Feb 26 2025, 10:43:25)
Type 'copyright', 'credits' or 'license' for more information
IPython 9.0.2 -- An enhanced Interactive Python. Type '?' for help.
Tip: IPython 9.0+ have hooks to integrate AI/LLM completions.
In [1]: b'\xce\x9e\xce\xad\xce\xbd\xce\xb7 \xce\x93\xce\xae\xce\xb9\xce\xbd\xce\xbf\xcf\x82'.decode('utf8')
Out[1]: 'Ξένη Γήινος'
Okay, now, how can I make it so I can directly launch PyPy3 in Windows Terminal without launching PowerShell first and make PyPy3 display UNICODE characters correctly?
Oh, and the output for the first code block from the linked GitHub page is:
pypy win32 3.11.11 (0253c85bf5f8, Feb 26 2025, 10:43:25)
[PyPy 7.3.19 with MSC v.1941 64 bit (AMD64)]
os.device_encoding(0)='cp850'
os.device_encoding(1)='cp850'
sys.getdefaultencoding()='utf-8'
sys.getfilesystemencoding()='utf-8'
locale.getpreferredencoding()='utf-8'
locale.getencoding()='utf-8'
locale.getlocale()=('English_United States', '1252')
locale.getlocale()=('English_United States', '1252')
This is a workaround, not a fix. chcp 65001
only works in Command Prompt, not PowerShell.
Now I have tested that running the above command and then starting PyPy3 fixes the problem. So I can just edit the command line field of my PyPy3 profile in Windows Terminal to this:
cmd /k chcp 65001 & D:\Programs\PyPy3\Scripts\Ipython.exe
So I can just first start a cmd session and set the code page and then open a PyPy3 session and move to that session.
This fixes the problem, but a more permanent solution is to have PyPy contributors fix this bug, they should have fixed this bug, but they have rolled out new versions without fixing this bug.