jsonsql-serversql-server-2019

Query to parse JSON using openrowset converting special characters


We are trying to process a JSON document using SQL Server 2019. When using the below query to parse JSON document with openrowset its converting special characters. For instance (Ó or Ñ) are converted to Ñ.

Query:

SELECT d.*
FROM OPENROWSET (BULK '\\drive\data.json', SINGLE_CLOB) X
CROSS APPLY OPENJSON (X.BulkColumn COLLATE Latin1_General_100_CI_AS_SC_UTF8, '$.Data')
WITH ([N] NVARCHAR(250) COLLATE Latin1_General_100_CI_AS_SC_UTF8) AS d

What other options are available to address this issue?


Solution

  • You can get the JSON as varbinary(max) then cast to varchar(max) with a UTF8 collation. The problem is that the cast automatically takes the database default collation. So you need to hack it with CONCAT.

    Note that the collation of the output columns of OPENJSON is the same as the input JSON.

    SELECT j.*
    FROM OPENROWSET (BULK '\\drive\data.json', SINGLE_BLOB) o
    CROSS APPLY OPENJSON (
        CONCAT('' COLLATE Latin1_General_100_CI_AS_SC_UTF8, o.BulkColumn)
        '$.Data')
      WITH (
        N NVARCHAR(250)
      ) AS j;