pythonpowershellinputcmdstandards

Python standard IO under Windows PowerShell and CMD


I have the following two-line Python (v. 3.10.7) program "stdin.py":

    import sys
    print(sys.stdin.read())

and the following one-line text file "ansi.txt" (CP1252 encoding) containing:

    ‘I am well’ he said. 

Note that the open and close quotes are 0x91 and 0x92, respectively. In Windows-10 cmd mode the behavior of the Python code is as expected:

    python stdin.py < ansi.txt  # --> ‘I am well’ he said.

On the other hand in Windows Powershell:

    cat .\ansi.txt | python .\stdin.py  # --> ?I am well? he said.

Apparently the CP1252 characters are seen as non-printable characters in the combination Python/PowerShell. If I replace in "stdin.py" the standard input by file input, Python prints correctly the CP1252 quote characters to screen. PowerShell by itself recognizes and prints correctly 0x91 and 0x92.

Questions: can somebody explain to me why cmd works differently than PowerShell in combination with Python? Why doesn't Python recognize the CP1252 quote characters 0x91 and 0x92 when they are piped into it by PowerShell?


Solution

  • tl;dr

    Use the $OutputEncoding preference variable:

    # Using the system's legacy ANSI code page, as Python does by default.
    # NOTE: The & { ... } enclosure isn't strictly necessary, but 
    #       ensures that the $OutputEncoding change is only temporary,
    #       by limiting to the child scope that the enclosure cretes.
    & {
     $OutputEncoding = [System.Text.Encoding]::Default
     "‘I am well’ he said." | python -c 'import sys; print(sys.stdin.read())'
    }
    
    # Using UTF-8 instead, which is generally preferable.
    # Note the `-X utf8` option (Python 3.7+)
    & {
     $OutputEncoding = [System.Text.UTF8Encoding]::new()
     "‘I am well’ he said." | python -X utf8 -c 'import sys; print(sys.stdin.read())'
    }
    
    # Using the system's legacy ANSI code page, as Python does by default.
    # Note: In PowerShell (Core) / .NET 5+,
    #       [System.Text.Encoding]::Default` now reports UTF-8, 
    #       not the active ANSI encoding.
    & {
     $OutputEncoding = [System.Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage)
     "‘I am well’ he said." | python -c 'import sys; print(sys.stdin.read())'
    }
    
    # Using UTF-8 instead, which is generally preferable.
    # Note the `-X utf8` option (Python 3.7+)
    # NO need to set $OutputEncoding, as it now *defaults* to UTF-8
    "‘I am well’ he said." | python -X utf8 -c 'import sys; print(sys.stdin.read())'
    

    Note:

    That these two encodings are not aligned by default is unfortunate; while Windows PowerShell will see no more changes, there is hope for PowerShell (Core): it would make sense to have it default consistently to UTF-8:


    Background information:

    It is the $OutputEncoding preference variable that determines what character encoding PowerShell uses to send data (invariably text, as of PowerShell 7.3) to an external program via the pipeline.

    Thus, the character encoding stored in $OutputEncoding must match the encoding that the target program expects.

    By default the encoding in $OutputEncoding is unrelated to the encoding implied by the console's active code page (which itself defaults to the system's legacy OEM code page, such as 437 on US-English systems), which is what at least legacy console applications tend to use; however, Python does not, and uses the legacy ANSI code page; other modern CLIs, notably Node.js' node.exe, always use UTF-8.

    While $OutputEncoding's default in PowerShell (Core) 7+ is now UTF-8, Windows PowerShell's default is, regrettably, ASCII(!), which means that non-ASCII characters get "lossily" transliterated to verbatim ASCII ? characters, which is what you saw.

    Therefore, you must (temporarily) set $OutputEncoding to the encoding that Python expects and/or ask it use UTF-8 instead.