By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8.
It can be done on a case-by-case basis by replacing the >foo.txt
syntax with | out-file foo.txt -encoding utf8
but this is awkward to have to repeat every time.
The persistent way to set things in PowerShell is to put them in \Users\me\Documents\WindowsPowerShell\profile.ps1
; I've verified that this file is indeed executed on startup.
It has been said that the output encoding can be set with $PSDefaultParameterValues = @{'Out-File:Encoding' = 'utf8'}
but I've tried this and it had no effect.
https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ which talks about $OutputEncoding
looks at first glance as though it should be relevant, but then it talks about output being encoded in ASCII, which is not what's actually happening.
How do you set PowerShell to use UTF-8?
Note:
The next section applies primarily to Windows PowerShell.
In both cases, the information applies to making PowerShell use UTF-8 for reading and writing files.
A system-wide switch to BOM-less UTF-8 is possible nowadays (since recent versions of Windows 10): see this answer, but note the following caveats:
The feature has far-reaching consequences, because both the OEM and the ANSI code page are then set to 65001
, i.e. UTF-8; also, the feature is still considered a beta feature as of this writing (Windows 11 22H2).
Additionally, in Windows PowerShell, it takes effect only for those file-writing cmdlets that default to the ANSI code page, notably Set-Content
, and therefore notably not for Out-File
/ >
; the bottom section lists the default encoding of all cmdlets.
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
, as explained below, to make all cmdlets default to UTF8, file-writing cmdlets will then invariably create UTF-8 files with BOM in Windows PowerShell; this is not a concern in PowerShell 7, which consistently defaults to BOM-less UTF-8 to begin with.In v5.1 (and also in PowerShell 7), where >
and >>
are effectively aliases of Out-File
, you can set the default encoding for >
/ >>
/ Out-File
via the $PSDefaultParameterValues
preference variable:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
In Windows PowerShell (the legacy, Windows-only, ships-with-Windows edition whose latest and last version is 5.1), this invariably creates UTF-8 files with a BOM.
In PowerShell (Core) 7, BOM-less UTF-8 is the default (see next section), but if you do want a BOM there, you can use 'utf8BOM'
In v5.0 or below, you cannot change the encoding for >
/ >>
, but, on v3 or higher, the above technique does work for explicit calls to Out-File
.
(The $PSDefaultParameterValues
preference variable was introduced in v3.0).
In v3.0 or higher, if you want to set the default encoding for all cmdlets that support
an -Encoding
parameter (which in v5.1 and PowerShell 7 includes >
and >>
), use:
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
If you place this command in your $PROFILE
, cmdlets such as Out-File
and Set-Content
will use UTF-8 encoding by default, but note that this makes it a session-global setting that will affect all commands / scripts that do not explicitly specify an encoding via their -Encoding
parameter.
Similarly, be sure to include such commands in your scripts or modules that you want to behave the same way, so that they indeed behave the same even when run by another user or a different machine; however, to avoid a session-global change, use the following form to create a local copy of $PSDefaultParameterValues
:
$PSDefaultParameterValues = @{ '*:Encoding' = 'utf8' }
For a summary of the wildly inconsistent default character encoding behavior across many of the Windows PowerShell standard cmdlets, see the bottom section.
The $OutputEncoding
preference variable is unrelated to the issue at hand: it only applies to how PowerShell communicates with external programs (it determines what encoding PowerShell uses when sending strings to them, and defaults to ASCII(!) in Windows PowerShell and BOM-less UTF-8 in PowerShell 7) and therefore has nothing to do with the encoding that the output redirection operators and PowerShell cmdlets use to save to files.
PowerShell is now cross-platform, via its PowerShell (Core) 7 edition, whose encoding - sensibly - defaults to BOM-less UTF-8, in line with Unix-like platforms.
This means that source-code files without a BOM are assumed to be UTF-8, and using >
/ Out-File
/ Set-Content
defaults to BOM-less UTF-8; explicit use of the utf8
-Encoding
argument too creates BOM-less UTF-8, but you can opt to create files with the pseudo-BOM with the utf8bom
value.
If you create PowerShell scripts with an editor on a Unix-like platform and nowadays even on Windows with cross-platform editors such as Visual Studio Code and Sublime Text, the resulting *.ps1
file will typically not have a UTF-8 pseudo-BOM:
Conversely, files that do have the UTF-8 pseudo-BOM can be problematic on Unix-like platforms, as they cause Unix utilities such as cat
, sed
, and awk
- and even some editors such as gedit
- to pass the pseudo-BOM through, i.e., to treat it as data.
bash
with, say, text=$(cat file)
or text=$(<file)
- the resulting variable will contain the pseudo-BOM as the first 3 bytes.Regrettably, the default character encoding used in Windows PowerShell is wildly inconsistent; the cross-platform PowerShell Core edition, as discussed in the previous section, has commendably put and end to this.
Note:
The following doesn't aspire to cover all standard cmdlets.
Googling cmdlet names to find their help topics now shows you the PowerShell Core version of the topics by default; use the version drop-down list above the list of topics on the left to switch to a Windows PowerShell version.
Historically, the documentation frequently incorrectly claimed that ASCII is the default encoding in Windows PowerShell; fortunately, this has since been corrected.
Cmdlets that write:
Out-File
and >
/ >>
create "Unicode" - UTF-16LE - files by default - in which every ASCII-range character (too) is represented by 2 bytes - which notably differs from Set-Content
/ Add-Content
(see next point); New-ModuleManifest
and Export-CliXml
also create UTF-16LE files.
Set-Content
(and Add-Content
if the file doesn't yet exist / is empty) uses ANSI encoding (the encoding specified by the active system locale's ANSI legacy code page, which PowerShell calls Default
).
Export-Csv
indeed creates ASCII files, as documented, but see the notes re -Append
below.
Export-PSSession
creates UTF-8 files with BOM by default.
New-Item -Type File -Value
currently creates BOM-less(!) UTF-8.
The Send-MailMessage
help topic also claims that ASCII encoding is the default - I have not personally verified that claim.
Start-Transcript
invariably creates UTF-8 files with BOM, but see the notes re -Append
below.
Re commands that append to an existing file:
>>
/ Out-File -Append
make no attempt to match the encoding of a file's existing content.
That is, they blindly apply their default encoding, unless instructed otherwise with -Encoding
, which is not an option with >>
(except indirectly in v5.1+, via $PSDefaultParameterValues
, as shown above).
In short: you must know the encoding of an existing file's content and append using that same encoding.
Add-Content
is the laudable exception: in the absence of an explicit -Encoding
argument, it detects the existing encoding and automatically applies it to the new content.Thanks, js2010. Note that in Windows PowerShell this means that it is ANSI encoding that is applied if the existing content has no BOM, whereas it is UTF-8 in PowerShell Core.
This inconsistency between Out-File -Append
/ >>
and Add-Content
, which also affects PowerShell Core, is discussed in GitHub issue #9423.
Export-Csv -Append
partially matches the existing encoding: it blindly appends UTF-8 if the existing file's encoding is any of ASCII/UTF-8/ANSI, but correctly matches UTF-16LE and UTF-16BE.
To put it differently: in the absence of a BOM, Export-Csv -Append
assumes UTF-8 is, whereas Add-Content
assumes ANSI.
Start-Transcript -Append
partially matches the existing encoding: It correctly matches encodings with BOM, but defaults to potentially lossy ASCII encoding in the absence of one.
Cmdlets that read (that is, the encoding used in the absence of a BOM):
Get-Content
and Import-PowerShellDataFile
default to ANSI (Default
), which is consistent with Set-Content
.
ANSI is also what the PowerShell engine itself defaults to when it reads source code from files.
By contrast, Import-Csv
, Import-CliXml
and Select-String
assume UTF-8 in the absence of a BOM, and so does the switch
statement with its -File
parameter.