jsonpowershellutf-8character-encodinginvoke-webrequest

JSON rejected for invalid UTF-8 start byte 0xa0, but encoding appears vaild


I'm creating a JSON file in PowerShell 7.4 to send to a 3rd party REST endpoint. Out-File defaults to UTF-8, and when I check the file in Notepad++, the encoding setting appears as UTF-8. Unfortunately, the POST is being rejected with the message:

400 Bad Request
JSON parse error
Nested exception is com.fasterxml.jackson.databind.JsonMappingException: Invalid UTF-8 start byte 0xa0\n at line: 9259, column: 38

I examined the JSON line specified in the error message. The source JSON file has a NO-BREAK SPACE sequence in its company name as follows:

String        Hex
------------  --------------------------------------
"Acme, Inc."  22 41 63 6d 65 2c c2 a0 49 6e 63 2e 22

NO-BREAK SPACE in UTF-8 appears as two bytes: 0xc2 0xa0. Both characters are present in the JSON file, but the error indicates that the remote parser isn't processing the first character as part of the sequence.

Here's the PowerShell script:

# identify CSV file

   $csvFile = Get-ChildItem -Path ($path + '*.csv') -File | 
                 Sort-Object LastWriteTime | 
                 Select-Object -First 1 

# suppress blank lines

   $objData = Get-Content $csvFile -Encoding UTF8 | 
                 Where-Object { $_ } | 
                 ConvertFrom-CSV

# convert to JSON and save to file
     
   $body = $objData | 
              ConvertTo-Json -Depth 100

   $body | 
      Out-File ( $path + 'data.json')
        
# post JSON

    $webParam = @{
       Uri         = $url 
       Method      = 'POST' 
       Headers     =  @{ 'Authorization' = $auth
                         'Cache-Control' = 'no-cache' }
       Body        = $body 
       ContentType = 'application/json'
    }
  
$apiResponse = Invoke-WebRequest @webParam

The data is usually different each time the script runs. On most occasions, the remote site will accept the JSON without an issue because it doesn't have any oddball Unicode characters.

I'm not sure why the remote site doesn't like the string, but the error makes sense if it can't distinguish the entire two-byte sequence. PowerShell's Test-JSON cmdlet always evaluates as true before I send. Has anyone encountered this before?


Solution

  • To ensure that PowerShell uses UTF-8 encoding also in versions 7.3.x and below (including Windows PowerShell) when it transmits the .NET string passed to the -Body parameter of Invoke-WebRequest, use
    -ContentType 'application/json; charset=utf-8' (in PowerShell 7.4+, this is no longer necessary); applied to your splatting scenario:

        $webParam = @{
           Uri         = $url 
           Method      = 'POST' 
           Headers     =  @{ 'Authorization' = $auth
                             'Cache-Control' = 'no-cache' }
           Body        = $body 
           # Note the addition of '; charset=utf-8'
           ContentType = 'application/json; charset=utf-8'
        }
      
    $apiResponse = Invoke-WebRequest @webParam