I prepared following request to get a file content from azure devops reop item api. the file content stored in git in UTF-8 format. but the output of rest api is not as expected! how to fix the issue to get content properly as stored in repo?
$uri = "http://devserver/defaultcollection/3e100875-e1dc-4aa4-a9d0-0e97af8a1634/_apis/git/repositories/f26ea979-3786-4bca-965e-0481c07ff9a9/items/Notes%2FREADME.md?versionType=Commit&version=26613c4596f233b0f48ea0f407465d941f0a4144&api-version=7.0"
$contentType = "application/json;charset=utf-8"
$headers = @{ Authorization = "Basic $encodedPAT" }
$fileContent = Invoke-RestMethod -Uri $uri -Headers $headers -ContentType $contentType -Method Get
Output is a Markdown content:
Title|Description|WorkItemID|Software|Area|Type|BuildNumber|Date
-|-|-|-|-|-|-|-
رÙع اشکا٠ÙÙاÛØ´ داد٠Ùشد٠Ùا٠ÙÙاÛØ´Û ÙدعÙÛ٠در صÙØÙ ÙشاÙد٠جÙسÙ|this is description|409925|Organizer||Bug|20231206.1|2023-12-06
tl;dr
Your -ContentType
argument has no effect; to ask the target web service to return a JSON response - assuming it supports it - you'll need to:
Use an Accept
header field, e.g.
-Headers @{ Accept = 'application/json'; Authorization = "Basic $encodedPAT" }
Alternatively, if available, in the context of a GET
request, use a query-string parameter to that effect as part of the URL.
The problem isn't specific to Azure, it is a general problem with PowerShell's web cmdlets: As detailed in the next section, Windows PowerShell and older versions of PowerShell (Core) 7+ mis-decode UTF-8 responses that aren't declared as such in the Content-Type
field of the response header. This is no longer a problem in PowerShell (Core) 7.4+, which now (consistently) defaults to UTF-8.
To ensure decoding as UTF-8, use Invoke-WebRequest
rather than Invoke-WebRequest
; the former's output objects have a .RawContentStream
property that returns a raw byte stream that you can decode with the encoding of choice.
Applied to your code (as noted, only required in PowerShell versions 7.3.x and below, including in Windows PowerShell):
$uri = "http://devserver/defaultcollection/3e100875-e1dc-4aa4-a9d0-0e97af8a1634/_apis/git/repositories/f26ea979-3786-4bca-965e-0481c07ff9a9/items/Notes%2FREADME.md?versionType=Commit&version=26613c4596f233b0f48ea0f407465d941f0a4144&api-version=7.0"
$headers = @{ Authorization = "Basic $encodedPAT" }
$fileContent =
[System.Text.Encoding]::UTF8.GetString(
(
Invoke-WebRequest -Uri $uri -Headers $headers -Method Get
).RawContentStream.ToArray()
)
Note the use of [System.Text.Encoding]::UTF8
to obtain a UTF-8 encoding, and its .GetString()
method to convert an array of bytes to a .NET string.
The -ContentType
parameter describes the media type and, optionally, character encoding of the body (data) sent with the request, not what you'd like to receive as a response.
Since you're merely performing a GET
request without using the -Body
parameter, the -ContentType
argument is effectively ignored.
While a header field is generally available that signals to the server what response character encoding is desired - Accept-Charset
- it is rarely honored in practice.
I presume the same applies if you use a charset
parameter in the context of also requesting specific media types, via the Accept
header field.
It is therefore the server that decides what character encoding to encode the response with and, crucially, whether or not to explicitly indicate that encoding in the Content-Type
response-header field, e.g. Content-Type: text/markdown; charset=utf-8
Strictly speaking, the media type for Markdown text, text/markdown
- assuming that it is used in the server's response - should contain a charset
parameter, which PowerShell's web cmdlets do honor.
In the absence of such a charset
parameter, it is therefore the default character encoding that applies, as used by PowerShell's web cmdlets, Invoke-WebRequest
and Invoke-RestMethod
.
The default character encoding used by the Invoke-WebRequest
and Invoke-RestMethod
cmdlets depends on the PowerShell edition and version, as shown in the following table:
Edition | Version | Default |
---|---|---|
Windows PowerShell | up to 5.1, the latest and last version | ISO 88591-1[1] |
PowerShell (Core) | 7.0 - 7.3.x | ISO 88591-1, except for application/json responses,[2] which default to UTF-8 |
PowerShell (Core) | 7.4 and above | UTF-8 |
This default encoding not only applies to decoding responses, but also to encoding request data, namely when you pass a string to the -Body
parameter (you may alternatively pass arbitrary [byte]
arrays); you can override this with a charset
parameter in the -ContentType
argument, e.g.:
-ContentType 'application/json; charset=utf-8'
If, in a given call, the response body gets mis-decoded due to the above-mentioned defaults, you need to manually decode the raw bytes, as shown in the top section.
[1] This encoding is largely identical to Windows-1252, except that the following characters are missing, notably including €
:
€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ
[2] Note that request JSON data passed as a string to the -Body
parameter is, curiously, still encoded as ISO 8859-1 by default, an inconsistency that was resolved in v7.4.