I have some XML in which every entry can contain some recurring elements. I'm trying to query it with OpenXML function and I want to reduce those elements to JSON arrays. My SQL looks like this:
declare @idoc int,
@xml xml = '
<?xml version="1.0" encoding="UTF-8"?>
<collection>
<individual>
<id>1</id>
<address>
<coutry>Country1</coutry>
<zip>ZIP1</zip>
<city>City1</city>
</address>
<address>
<coutry>Country2</coutry>
<zip>ZIP2</zip>
<city>City2</city>
</address>
<document>
<num>101</num>
<issued>2020-01-01</issued>
<description>desc1</description>
</document>
<document>
<num>102</num>
<issued>2020-01-01</issued>
<description>desc2</description>
</document>
</individual>
<individual>
<id>2</id>
<address>
<coutry>Country3</coutry>
<zip>ZIP3</zip>
<city>City3</city>
</address>
<address>
<coutry>Country4</coutry>
<zip>ZIP4</zip>
<city>City4</city>
</address>
<document>
<num>103</num>
<issued>2020-01-03</issued>
<description>desc3</description>
</document>
<document>
<num>104</num>
<issued>2020-01-04</issued>
<description>desc4</description>
</document>
</individual>
</collection>';
exec sp_xml_preparedocument @idoc out, @xml;
select
id as ID
, address as AddressesJson
, document as DocumentsJson
from openxml(@idoc, '//individual', 2) with (
id int
, address nvarchar(max)
, document nvarchar(max)
);
exec sp_xml_removedocument @idoc;
The rusult I'm getting is
|ID |AddressesJson |DocumentsJson |
|---|-------------------|-------------------|
|1 |Country1ZIP1City1 |1012020-01-01desc1 |
|2 |Country3ZIP3City3 |1032020-01-03desc3 |
What I would like to get is
|ID |AddressesJson |DocumentsJson |
|---|-------------------|-------------------|
|1 |[{"coutry":"Country1","zip":"ZIP1","city":"City1"},{"coutry":"Country2","zip":"ZIP2","city":"City2"}] |[{"num":"101","issued":"2020-01-01","description":"desc1"},{"num":"102","issued":"2020-01-02","description":"desc2"}] |
|2 |[{"coutry":"Country3","zip":"ZIP3","city":"City3"},{"coutry":"Country4","zip":"ZIP4","city":"City4"}] |[{"num":"103","issued":"2020-01-03","description":"desc3"},{"num":"104","issued":"2020-01-04","description":"desc4"}] |
How can I achieve this?
P.S. I'm using OpenXML because it seems to work faster. I would also appreciate a solution with xml.nodes()/xquery
Seems a couple of subqueries and a JSON PATH
is what you want here. Note, as well, I had to amend your xml
to remove the leading line break, as that actually makes the value an invalid xml
value:
DECLARE @idoc int,
@xml xml = '<?xml version="1.0" encoding="UTF-8"?>
<collection>
<individual>
<id>1</id>
<address>
<coutry>Country1</coutry>
<zip>ZIP1</zip>
<city>City1</city>
</address>
<address>
<coutry>Country2</coutry>
<zip>ZIP2</zip>
<city>City2</city>
</address>
<document>
<num>101</num>
<issued>2020-01-01</issued>
<description>desc1</description>
</document>
<document>
<num>102</num>
<issued>2020-01-01</issued>
<description>desc2</description>
</document>
</individual>
<individual>
<id>2</id>
<address>
<coutry>Country3</coutry>
<zip>ZIP3</zip>
<city>City3</city>
</address>
<address>
<coutry>Country4</coutry>
<zip>ZIP4</zip>
<city>City4</city>
</address>
<document>
<num>103</num>
<issued>2020-01-03</issued>
<description>desc3</description>
</document>
<document>
<num>104</num>
<issued>2020-01-04</issued>
<description>desc4</description>
</document>
</individual>
</collection>';
SELECT c.i.value('(id/text())[1]','int') AS id,
(SELECT i.a.value('(coutry/text())[1]','varchar(30)') AS country, --It's spelt country, I suggest fixing this at your source, as fundament typographical errors like this can be a real problem later down the line
i.a.value('(zip/text())[1]','varchar(30)') AS zip,
i.a.value('(city/text())[1]','varchar(30)') AS city
FROM c.i.nodes('address')i(a)
FOR JSON PATH) AS AddressJson,
(SELECT i.d.value('(num/text())[1]','int') AS num,
i.d.value('(issued/text())[1]','date') AS issued,
i.d.value('(description/text())[1]','varchar(30)') AS description
FROM c.i.nodes('document')i(d)
FOR JSON PATH) AS DocumentJson
FROM @xml.nodes('collection/individual') c(i);