powershellutf-8character-encoding

Changing PowerShell's default output encoding to UTF-8


By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8.

It can be done on a case-by-case basis by replacing the >foo.txt syntax with | out-file foo.txt -encoding utf8 but this is awkward to have to repeat every time.

The persistent way to set things in PowerShell is to put them in \Users\me\Documents\WindowsPowerShell\profile.ps1; I've verified that this file is indeed executed on startup.

It has been said that the output encoding can be set with $PSDefaultParameterValues = @{'Out-File:Encoding' = 'utf8'} but I've tried this and it had no effect.

https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ which talks about $OutputEncoding looks at first glance as though it should be relevant, but then it talks about output being encoded in ASCII, which is not what's actually happening.

How do you set PowerShell to use UTF-8?


Solution

  • Note:


    The Windows PowerShell perspective:

    If you place this command in your $PROFILE, cmdlets such as Out-File and Set-Content will use UTF-8 encoding by default, but note that this makes it a session-global setting that will affect all commands / scripts that do not explicitly specify an encoding via their -Encoding parameter.

    Similarly, be sure to include such commands in your scripts or modules that you want to behave the same way, so that they indeed behave the same even when run by another user or a different machine; however, to avoid a session-global change, use the following form to create a local copy of $PSDefaultParameterValues:

    For a summary of the wildly inconsistent default character encoding behavior across many of the Windows PowerShell standard cmdlets, see the bottom section.


    The automatic $OutputEncoding variable is unrelated, and only applies to how PowerShell communicates with external programs (what encoding PowerShell uses when sending strings to them) - it has nothing to do with the encoding that the output redirection operators and PowerShell cmdlets use to save to files.


    Optional reading: The cross-platform perspective: PowerShell (Core) 7:

    PowerShell is now cross-platform, via its PowerShell (Core) 7 edition, whose encoding - sensibly - defaults to BOM-less UTF-8, in line with Unix-like platforms.


    Inconsistent default encoding behavior in Windows PowerShell:

    Regrettably, the default character encoding used in Windows PowerShell is wildly inconsistent; the cross-platform PowerShell Core edition, as discussed in the previous section, has commendably put and end to this.

    Note:


    Cmdlets that write:

    Out-File and > / >> create "Unicode" - UTF-16LE - files by default - in which every ASCII-range character (too) is represented by 2 bytes - which notably differs from Set-Content / Add-Content (see next point); New-ModuleManifest and Export-CliXml also create UTF-16LE files.

    Set-Content (and Add-Content if the file doesn't yet exist / is empty) uses ANSI encoding (the encoding specified by the active system locale's ANSI legacy code page, which PowerShell calls Default).

    Export-Csv indeed creates ASCII files, as documented, but see the notes re -Append below.

    Export-PSSession creates UTF-8 files with BOM by default.

    New-Item -Type File -Value currently creates BOM-less(!) UTF-8.

    The Send-MailMessage help topic also claims that ASCII encoding is the default - I have not personally verified that claim.

    Start-Transcript invariably creates UTF-8 files with BOM, but see the notes re -Append below.

    Re commands that append to an existing file:

    >> / Out-File -Append make no attempt to match the encoding of a file's existing content. That is, they blindly apply their default encoding, unless instructed otherwise with -Encoding, which is not an option with >> (except indirectly in PSv5.1+, via $PSDefaultParameterValues, as shown above). In short: you must know the encoding of an existing file's content and append using that same encoding.

    Add-Content is the laudable exception: in the absence of an explicit -Encoding argument, it detects the existing encoding and automatically applies it to the new content.Thanks, js2010. Note that in Windows PowerShell this means that it is ANSI encoding that is applied if the existing content has no BOM, whereas it is UTF-8 in PowerShell Core.

    This inconsistency between Out-File -Append / >> and Add-Content, which also affects PowerShell Core, is discussed in GitHub issue #9423.

    Export-Csv -Append partially matches the existing encoding: it blindly appends UTF-8 if the existing file's encoding is any of ASCII/UTF-8/ANSI, but correctly matches UTF-16LE and UTF-16BE.
    To put it differently: in the absence of a BOM, Export-Csv -Append assumes UTF-8 is, whereas Add-Content assumes ANSI.

    Start-Transcript -Append partially matches the existing encoding: It correctly matches encodings with BOM, but defaults to potentially lossy ASCII encoding in the absence of one.


    Cmdlets that read (that is, the encoding used in the absence of a BOM):

    Get-Content and Import-PowerShellDataFile default to ANSI (Default), which is consistent with Set-Content.
    ANSI is also what the PowerShell engine itself defaults to when it reads source code from files.

    By contrast, Import-Csv, Import-CliXml and Select-String assume UTF-8 in the absence of a BOM, and so does the switch statement with its -File parameter.