powershellunicodeutf-8terminalwindows-console

Displaying Unicode in PowerShell


What I'm trying to achieve should be rather straightforward although PowerShell is trying to make it hard.

I want to display the full path of files, some with Arabic, Chinese, Japanese and Russian characters in their names.

I always get some undecipherable output, such as the one shown below:

Enter image description here

The output seen in the console is being consumed as is by another script. The output contains ? instead of the actual characters.

The command executed is

(Get-ChildItem -Recurse -Path "D:\test" -Include *unicode* | Get-ChildItem -Recurse).FullName

Is there an easy way to launch PowerShell (via the command line or in a fashion that can be written into a script) such that the output is seen correctly?

P.S. I've gone through many similar questions on Stack Overflow, but none of them have much input other than calling it a Windows Console Subsystem issue.


Solution

  • Note:


    The PowerShell (Core) 7 perspective (see next section for Windows PowerShell), irrespective of character rendering issues (also covered in the next section), with respect to communicating with external programs:


    Making your Windows PowerShell console window Unicode (UTF-8) aware:

    The following magic incantation does this, both in Windows PowerShell and PowerShell 7 (as stated, this implicitly performs chcp 65001, and setting $OutputEncoding isn't strictly necessary in PowerShell 7, as it defaults to UTF-8):

    $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding =
                        New-Object System.Text.UTF8Encoding
    

    To persist these settings, i.e., to make your future interactive PowerShell sessions UTF-8-aware by default, add the command above to your $PROFILE file.

    Note: Recent versions of Windows 10 now allow setting the system locale to code page 65001 (UTF-8) (the feature is still in beta as of Window 10 version 1903), which makes all console windows default to UTF-8, including Windows PowerShell's.
    If you do use that feature, setting [Console]::InputEncoding / [Console]::OutputEncoding is then no longer strictly necessary, but you'll still have to set $OutputEncoding (which is not necessary in PowerShell Core, where $OutputEncoding already defaults to UTF-8).

    Important:


    Optional background information

    Tip of the hat to eryksun for all his input.


    Superior alternatives to the native Windows console (terminal), conhost.exe

    eryksun suggests two alternatives to the native Windows console windows (conhost.exe), which provider better and faster Unicode character rendering, due to using the modern, GPU-accelerated DirectWrite/DirectX API instead of the "old GDI implementation [that] cannot handle complex scripts, non-BMP characters, or automatic fallback fonts."


    [1] Note that running chcp 65001 from inside a PowerShell session is not effective, because .NET caches the console's output encoding on startup and is unaware of later changes made with chcp (only changes made directly via [console]::OutputEncoding] are picked up).

    [2] I am unclear on how that manifests in practice; do tell us, if you know.