I'm a Windows and Powershell noobie. I'm coming from Linux Land. I used to have this little Bash function in my .bashrc
that would copy a "shruggie" (¯\_(ツ)_/¯
) to the clipboard for me so that I could paste it into conversations on Slack and such.
My Bash alias looked like this: alias shruggie='printf "¯\_(ツ)_/¯" | xclip -selection c && echo "¯\_(ツ)_/¯"'
I realize that this question is juvenile, but the answer does have value to me as I'm sure that I will need to pipe odd UTF-8 characters to output in a Powershell script at some point in the future.
I wrote this function in my PowerShell profile:
function shruggie() {
'¯\_(ツ)_/¯' | clip
Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
However, this gives me: ??\_(???)_/??
(Unknown UTF-8 chars are converted to ?
) when I call it on the command line.
I've looked at [System.Text.Encoding]::UTF8
and some other questions but I don't know how to cast my string as UTF-8 and pass that through clip.exe
and receive UTF-8 out on the other side (on the clipboard).
There are two distinct, independent aspects:
¯\_(ツ)_/¯
to the clipboard, using clip.exe
¯\_(ツ)_/¯
to the consolePrerequisite: PowerShell must properly recognize your source code's encoding in order for the solutions below to work: if your source code is UTF-8-encoded, be sure to save the enclosing files as UTF-8 with BOM for Windows PowerShell to recognize it.
Windows PowerShell, in the absence of BOM, interprets source as "ANSI"-encoded, referring to the legacy, single-byte, extended-ASCII code page in effect, such as Windows-1252 on US-English system, and would therefore interpret UTF-8-encoded source code incorrectly.
Note that, by contrast, PowerShell Core uses UTF-8 as the default, so the BOM is no longer necessary (but still recognized).
¯\_(ツ)_/¯
to the clipboard, using clip.exe
:In Windows PowerShell v5.1+, you can use the built-in Set-Clipboard
cmdlet to copy text to the clipboard from within PowerShell; given that PowerShell uses the .NET System.String
type that is capable of representing all Unicode characters, there are no encoding issues.
In earlier versions of Windows PowerShell and in PowerShell Core, use of clip.exe
is a viable alternative, but its use requires additional work:
function shruggie() {
$OutputEncoding = (New-Object System.Text.UnicodeEncoding $False, $False).psobject.BaseObject
'¯\_(ツ)_/¯' | clip
Write-Verbose -Verbose "Shruggie copied to clipboard." # see section about console output
}
New-Object System.Text.UnicodeEncoding $False, $False
creates a BOM-less UTF16-LE encoding, which clip.exe
understands.
.psobject.BaseObject
incantation is, unfortunately, required to work around a bug; in PSv5+, you can bypass this bug by using the following instead:[System.Text.UnicodeEncoding]::new($False, $False)
Assigning that encoding to preference variable $OutputEncoding
ensures that PowerShell uses that encoding to pipe data to external utility clip.exe
.
¯\_(ツ)_/¯
to the console:Note: PowerShell Core on Unix platforms generally uses consoles (terminals) with a default encoding of (BOM-less) UTF-8, so no additional work is needed there.
To merely echo (print) Unicode characters (beyond the 8-bit range), it is sufficient to switch to a font that can display Unicode characters (beyond the extended ASCII range), because, as PetSerAl points out, PowerShell uses the Unicode version of the WriteConsole
Windows API function to print to the console.
To support (most) Unicode characters, you most switch to one of the "TT" (TrueType) fonts.
PetSerAl points out in a comment that console windows on Windows are currently limited to a single 16-bit code unit per output character (cell); given that only (most of) the characters in the BMP (Basic Multilingual Plane) are self-contained 16-bit code units, the (rare) characters beyond the BMP cannot be represented.
Sadly, even that may not be enough for some (BMP) Unicode characters, given that the Unicode standard is versioned and font representations / implementations may lag.
Indeed, as of Windows 10 release ID 1703, only a select few fonts can render ツ
(Unicode character KATAKANA LETTER TU
, U+30C4
, UTF-8: E3 83 84
):
MS Gothic
NSimSum
Note that if you want to (also) change how other applications interpret such output, you must again set $OutputEncoding
:
For instance, to make PowerShell expect UTF-8 input from external utilities as well as output UTF-8-encoded data to external utilities, use the following:
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
The above implicitly changes the code page to 65001
(UTF-8), as reflected in chcp
(chcp.com
).
Note that, for backward compatibility, Windows console windows still default to the single-byte, extended-ASCII legacy OEM code page, such as 437
on US-English systems.
Unfortunately, as of v6.0.0-rc.2, this also applies to PowerShell Core, even though it has otherwise switched to BOM-less UTF-8 as the default encoding, as also reflected in $OutputEncoding
.