powershellprintingutf-8outputstdout

How to capture the output string of a UTF-8 program using PowerShell?


The program inv.exe returns some console data based on parameters. It seems like a JSON/dictionary, but it's in text format (printed output). It works when I simply call it without trying to capture the output.

.\inv.exe getter segments
{28: 'Renda Fixa', 29: 'Renda Variável', ...

However, if I try to capture it, it doesn't work:

$segmentsjson = .\inv.exe getter segments
$segmentsjson
{28: 'Renda Fixa', 29: 'Renda Vari�vel'....

$segmentsjson = .\inv.exe getter segments | ConvertFrom-Json
$segmentsjson
{"28": "Renda Fixa", "29": "Renda Variável"...

What I tried:

1. chcp 65001

2. $OutputEncoding = [System.Text.Encoding]::UTF8

3. [Console]::OutputEncoding = [System.Text.Encoding]::UTF8

4. $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = New-Object System.Text.UTF8Encoding

5. .\inv.exe getter segments > test.txt
$segmentsjson = Get-Content "test.txt" -Encoding UTF8

6. .\inv.exe getter segments | Out-File -FilePath "output_temp.txt" -Encoding UTF-8

7. cmd /c inv.exe getter segments > test.txt

Solution

  • Character-encoding problems may only surface if external-program output is captured or redirected in PowerShell on Windows, because some CLIs - including high-profile ones such as python.exe and node.exe - use the Unicode version of the WriteConsole WinAPI function when printing to the console, where all characters print as intended.[1]

    PowerShell indeed uses [Console]::OutputEncoding when decoding external-program output into .NET strings (System.String ([string], in PowerShell terms), which internally uses a Unicode encoding composed of in-memory UTF-16 code units (System.Char ([char])).

    If [Console]::OutputEncoding = [System.Text.Encoding]::UTF8 doesn't help, the implication is that inv.exe's output isn't UTF-8.

    Thus, you must (temporarily) set [Console]::OutputEncoding to match the actual character encoding inv.exe uses, which looks to be the legacy system locale's active ANSI encoding, presumably Windows-1252.

    The following code temporarily sets [Console]::OutputEncoding to the active ANSI code page's, calls .inv.exe, then restores the original encoding:

    $segmentsjson = 
      & {
        $prevEnc = [Console]::OutputEncoding
        # Set [Console]::OutputEncoding to that of the system's active ANSI code page.
        [Console]::OutputEncoding = 
          if ($IsCoreCLR) { [Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)) } 
          else { [Text.Encoding]::Default }
    
        .\inv.exe getter segments
    
        [Console]::OutputEncoding = $prevEnc
      }
    

    Note:


    [1] Depending on the selected font, not all Unicode characters may render properly, but the console buffer does store them correctly, so you can copy and paste them without loss of information.