windowspowershellutf-8console-application

Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)


I've been forcing the usage of chcp 65001 in Command Prompt and Windows Powershell for some time now, but judging by Q&A posts on SO and several other communities it seems like a dangerous and inefficient solution. Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry? And if there isn't, is there a publicly announced timeline or agenda to support UTF-8 in the Windows CLI in the future?

Personally I've been using chcp 949 for Korean Character Support, but the weird display of the backslash \ and incorrect/incomprehensible displays in several applications (like Neovim), as well as characters that aren't Korean not being supported via 949 seems to become more of a problem lately.


Solution

  • Note:


    Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry?

    As of (at least) Windows 10, version 1903, you have the option to set the system locale (language for non-Unicode programs) to UTF-8, but the feature is still in beta as of this writing and fundamentally has far-reaching consequences.

    To activate it:

    Control Panel > Region > Administrative


    If setting the system locale to UTF-8 is not an option in your environment, use startup commands instead:

    Note: The caveat re legacy console applications mentioned above equally applies here. If running legacy console applications is important to you, see eryksun's recommendations in the comments.

    $OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
    
    '$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding' + [Environment]::Newline + (Get-Content -Raw $PROFILE -ErrorAction SilentlyContinue) | Set-Content -Encoding utf8 $PROFILE
    
    # Auto-execute `chcp 65001` whenever the current user opens a `cmd.exe` console
    # window (including when running a batch file):
    Set-ItemProperty 'HKCU:\Software\Microsoft\Command Processor' AutoRun 'chcp 65001 >NUL'
    

    Optional reading: Why using the Windows PowerShell ISE is ill-advised in general:

    While the ISE does have better Unicode rendering support than the console, it is generally a poor choice:


    [1] In PowerShell, if you never call external programs, you needn't worry about the system locale (active code pages): PowerShell-native commands and .NET calls always communicate via UTF-16 strings (native .NET strings) and on file I/O apply default encodings that are independent of the system locale. Similarly, because the Unicode versions of the Windows API functions are used to print to and read from the console, non-ASCII characters always print correctly (within the rendering limitations of the console).
    In cmd.exe, by contrast, the system locale matters for file I/O (with < and > redirections, but notably including what encoding to assume for batch-file source code), not just for communicating with external programs in-memory (such as when reading program output in a for /f loop).

    [2] In PowerShell v4-, where the static ::new() method isn't available, use $OutputEncoding = (New-Object System.Text.UTF8Encoding).psobject.BaseObject. See GitHub issue #5763 for why the .psobject.BaseObject part is needed.