powershellazure-devopscharacter-encodingdevopsinvoke-restmethod

How to Get Unicode data from Azure Devops Git Repository Get Item Rest Api?


I prepared following request to get a file content from azure devops reop item api. the file content stored in git in UTF-8 format. but the output of rest api is not as expected! how to fix the issue to get content properly as stored in repo?

$uri = "http://devserver/defaultcollection/3e100875-e1dc-4aa4-a9d0-0e97af8a1634/_apis/git/repositories/f26ea979-3786-4bca-965e-0481c07ff9a9/items/Notes%2FREADME.md?versionType=Commit&version=26613c4596f233b0f48ea0f407465d941f0a4144&api-version=7.0"
$contentType  = "application/json;charset=utf-8"
$headers = @{ Authorization = "Basic $encodedPAT" }

$fileContent = Invoke-RestMethod -Uri $uri -Headers $headers -ContentType $contentType -Method Get

Output is a Markdown content:

Title|Description|WorkItemID|Software|Area|Type|BuildNumber|Date
-|-|-|-|-|-|-|-
رÙع اشکا٠ÙÙاÛØ´ داد٠Ùشد٠Ùا٠ÙÙاÛØ´Û ÙدعÙÛ٠در صÙØ­Ù ÙشاÙد٠جÙسÙ|this is description|409925|Organizer||Bug|20231206.1|2023-12-06

Solution

  • tl;dr

    To ensure decoding as UTF-8, use Invoke-WebRequest rather than Invoke-WebRequest; the former's output objects have a .RawContentStream property that returns a raw byte stream that you can decode with the encoding of choice.

    Applied to your code (as noted, only required in PowerShell versions 7.3.x and below, including in Windows PowerShell):

    $uri = "http://devserver/defaultcollection/3e100875-e1dc-4aa4-a9d0-0e97af8a1634/_apis/git/repositories/f26ea979-3786-4bca-965e-0481c07ff9a9/items/Notes%2FREADME.md?versionType=Commit&version=26613c4596f233b0f48ea0f407465d941f0a4144&api-version=7.0"
    $headers = @{ Authorization = "Basic $encodedPAT" }
    
    $fileContent = 
     [System.Text.Encoding]::UTF8.GetString(
       (
         Invoke-WebRequest -Uri $uri -Headers $headers -Method Get
       ).RawContentStream.ToArray()
     )
    

    Note the use of [System.Text.Encoding]::UTF8 to obtain a UTF-8 encoding, and its .GetString() method to convert an array of bytes to a .NET string.


    Background information:

    The default character encoding used by the Invoke-WebRequest and Invoke-RestMethod cmdlets depends on the PowerShell edition and version, as shown in the following table:

    Edition Version Default
    Windows PowerShell up to 5.1, the latest and last version ISO 88591-1[1]
    PowerShell (Core) 7.0 - 7.3.x ISO 88591-1, except for application/json responses,[2] which default to UTF-8
    PowerShell (Core) 7.4 and above UTF-8

    [1] This encoding is largely identical to Windows-1252, except that the following characters are missing, notably including :
    € ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ

    [2] Note that request JSON data passed as a string to the -Body parameter is, curiously, still encoded as ISO 8859-1 by default, an inconsistency that was resolved in v7.4.