sql-serverxqueryopen-jsonfor-jsonsql-openxml

SQL Server reduce recurring XML nodes to JSON array


I have some XML in which every entry can contain some recurring elements. I'm trying to query it with OpenXML function and I want to reduce those elements to JSON arrays. My SQL looks like this:

declare @idoc int,
        @xml xml = '
<?xml version="1.0" encoding="UTF-8"?>
<collection>
    <individual>
        <id>1</id>
        <address>
            <coutry>Country1</coutry>
            <zip>ZIP1</zip>
            <city>City1</city>
        </address>
        <address>
            <coutry>Country2</coutry>
            <zip>ZIP2</zip>
            <city>City2</city>
        </address>
        <document>
            <num>101</num>
            <issued>2020-01-01</issued>
            <description>desc1</description>
        </document>
        <document>
            <num>102</num>
            <issued>2020-01-01</issued>
            <description>desc2</description>
        </document>
    </individual>
    <individual>
        <id>2</id>
        <address>
            <coutry>Country3</coutry>
            <zip>ZIP3</zip>
            <city>City3</city>
        </address>
        <address>
            <coutry>Country4</coutry>
            <zip>ZIP4</zip>
            <city>City4</city>
        </address>
        <document>
            <num>103</num>
            <issued>2020-01-03</issued>
            <description>desc3</description>
        </document>
        <document>
            <num>104</num>
            <issued>2020-01-04</issued>
            <description>desc4</description>
        </document>
    </individual>
</collection>';

exec sp_xml_preparedocument @idoc out, @xml;

select
    id as ID
    , address as AddressesJson
    , document as DocumentsJson
from openxml(@idoc, '//individual', 2) with (
    id int
    , address nvarchar(max)
    , document nvarchar(max)
);

exec sp_xml_removedocument @idoc;

The rusult I'm getting is

|ID |AddressesJson      |DocumentsJson      |
|---|-------------------|-------------------|
|1  |Country1ZIP1City1  |1012020-01-01desc1 |
|2  |Country3ZIP3City3  |1032020-01-03desc3 |

What I would like to get is

|ID |AddressesJson      |DocumentsJson      |
|---|-------------------|-------------------|
|1  |[{"coutry":"Country1","zip":"ZIP1","city":"City1"},{"coutry":"Country2","zip":"ZIP2","city":"City2"}]  |[{"num":"101","issued":"2020-01-01","description":"desc1"},{"num":"102","issued":"2020-01-02","description":"desc2"}]  |
|2  |[{"coutry":"Country3","zip":"ZIP3","city":"City3"},{"coutry":"Country4","zip":"ZIP4","city":"City4"}]  |[{"num":"103","issued":"2020-01-03","description":"desc3"},{"num":"104","issued":"2020-01-04","description":"desc4"}]  |

How can I achieve this?

P.S. I'm using OpenXML because it seems to work faster. I would also appreciate a solution with xml.nodes()/xquery


Solution

  • Seems a couple of subqueries and a JSON PATH is what you want here. Note, as well, I had to amend your xml to remove the leading line break, as that actually makes the value an invalid xml value:

    DECLARE @idoc int,
            @xml xml = '<?xml version="1.0" encoding="UTF-8"?>
    <collection>
        <individual>
            <id>1</id>
            <address>
                <coutry>Country1</coutry>
                <zip>ZIP1</zip>
                <city>City1</city>
            </address>
            <address>
                <coutry>Country2</coutry>
                <zip>ZIP2</zip>
                <city>City2</city>
            </address>
            <document>
                <num>101</num>
                <issued>2020-01-01</issued>
                <description>desc1</description>
            </document>
            <document>
                <num>102</num>
                <issued>2020-01-01</issued>
                <description>desc2</description>
            </document>
        </individual>
        <individual>
            <id>2</id>
            <address>
                <coutry>Country3</coutry>
                <zip>ZIP3</zip>
                <city>City3</city>
            </address>
            <address>
                <coutry>Country4</coutry>
                <zip>ZIP4</zip>
                <city>City4</city>
            </address>
            <document>
                <num>103</num>
                <issued>2020-01-03</issued>
                <description>desc3</description>
            </document>
            <document>
                <num>104</num>
                <issued>2020-01-04</issued>
                <description>desc4</description>
            </document>
        </individual>
    </collection>';
    
    SELECT c.i.value('(id/text())[1]','int') AS id,
           (SELECT i.a.value('(coutry/text())[1]','varchar(30)') AS country, --It's spelt country, I suggest fixing this at your source, as fundament typographical errors like this can be a real problem later down the line
                   i.a.value('(zip/text())[1]','varchar(30)') AS zip,
                   i.a.value('(city/text())[1]','varchar(30)') AS city
            FROM c.i.nodes('address')i(a)
            FOR JSON PATH) AS AddressJson,
           (SELECT i.d.value('(num/text())[1]','int') AS num,
                   i.d.value('(issued/text())[1]','date') AS issued,
                   i.d.value('(description/text())[1]','varchar(30)') AS description
            FROM c.i.nodes('document')i(d)
            FOR JSON PATH) AS DocumentJson
    FROM @xml.nodes('collection/individual') c(i);
    

    db<>fiddle