I have a table myTable
with three XML columns of the same structure:
CREATE TABLE myTable
(
Field1 XML,
Field2 XML,
Field3 XML
);
INSERT INTO myTable (Field1, Field2, Field3)
VALUES
('<div class="Class1">
<p>123</p>
<p>456</p>
</div>',
'<div class="Class2">
<p>abc</p>
<p>def</p>
</div>',
'<div class="Class3">
<p>XYZ</p>
<p>AEIOU</p>
</div>')
I would like to extract the values between the <p>
tags, which I am able to accomplish for a single field using nodes()
. However, when I try to bring in the other fields, it creates a cross reference of all the values, rather than showing each one's first, second, etc.
This is my code:
SELECT
a.b.value('.','nvarchar(max)'),
c.d.value('.','nvarchar(max)'),
e.f.value('.','nvarchar(max)')
FROM
myTable
CROSS APPLY
myTable.Field1.nodes('div/p') a(b)
CROSS APPLY
myTable.Field2.nodes('div/p') c(d)
CROSS APPLY
myTable.Field3.nodes('div/p') e(f)
Which produces output like this:
Field1 Field2 Field3
----------------------
123 abc XYZ
123 abc AEIOU
123 def XYZ
123 def AEIOU
456 abc XYZ
456 abc AEIOU
456 def XYZ
456 def AEIOU
I would like the output to be:
Field1 Field2 Field3
----------------------
123 abc XYZ
456 def AEIOU
The XML columns will always have the same number of nodes for a single row, but the number of nodes can vary from row to row.
Is there any way to accomplish this within a single query?
Annoyingly (as far as I am aware), you can't return the position value from a node, you can only filter to it (such as demonstrated in this answer from Roger Wolf).
Instead what you could do is create a Tally of for the number of nodes in the first XML column to get a row for each p
node for each ID. Then you can filter on the position()
in your call to the nodes
method and return 1 row per position.
As you're on 2019, you don't have access to GENERATE_SERIES
, so I use a UDF for the tally:
--Create the UDF
CREATE FUNCTION [fn].[Tally] (@LastNumber bigint, @Zero bit)
RETURNS table
AS RETURN
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT 0 AS I
WHERE @Zero = 0
AND @LastNumber IS NOT NULL
UNION ALL
SELECT TOP (ISNULL(@LastNumber,0))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4, N N5, N N6, N N7) --Up to 10,000,000 rows
SELECT I
FROM Tally T;
GO
--Create sample data
CREATE TABLE dbo.YourTable (YourID int IDENTITY,
XML1 xml,
XML2 xml,
XML3 xml);
INSERT INTO dbo.YourTable
VALUES('<div class="Class1">
<p>123</p>
<p>456</p>
</div>','<div class="Class2">
<p>abc</p>
<p>def</p>
</div>','<div class="Class3">
<p>XYZ</p>
<p>AEIOU</p>
</div>');
GO
--The actual solution
WITH NodeCounts AS(
SELECT YT.YourID,
COUNT(*) AS Nodes
FROM dbo.YourTable YT
CROSS APPLY YT.XML1.nodes('/div/p') div(p)
GROUP BY YT.YourID)
SELECT YT.YourID,
X1.p.value('(./text())[1]','varchar(30)') AS Field1,
X2.p.value('(./text())[1]','varchar(30)') AS Field2,
X3.p.value('(./text())[1]','varchar(30)') AS Field3
FROM dbo.YourTable YT
JOIN NodeCounts NC ON YT.YourID = NC.YourID
CROSS APPLY fn.Tally(NC.Nodes,1) T
CROSS APPLY YT.XML1.nodes('/div/p[position() = sql:column("T.I")]') X1(p)
CROSS APPLY YT.XML2.nodes('/div/p[position() = sql:column("T.I")]') X2(p)
CROSS APPLY YT.XML3.nodes('/div/p[position() = sql:column("T.I")]') X3(p);
Alternatively, you could achieve this without a Tally and use ROW_NUMBER
in the CTE. This might be more performant (I suggest racing your horses):
WITH NodeCounts AS(
SELECT YT.YourID,
YT.XML1,
YT.XML2,
YT.XML3,
ROW_NUMBER() OVER (PARTITION BY YT.YourID ORDER BY (SELECT NULL)) AS I
FROM dbo.YourTable YT
CROSS APPLY YT.XML1.nodes('/div/p') div(p))
SELECT NC.YourID,
X1.p.value('(./text())[1]','varchar(30)') AS Field1,
X2.p.value('(./text())[1]','varchar(30)') AS Field2,
X3.p.value('(./text())[1]','varchar(30)') AS Field3
FROM NodeCounts NC
CROSS APPLY NC.XML1.nodes('/div/p[position() = sql:column("NC.I")]') X1(p)
CROSS APPLY NC.XML2.nodes('/div/p[position() = sql:column("NC.I")]') X2(p)
CROSS APPLY NC.XML3.nodes('/div/p[position() = sql:column("NC.I")]') X3(p);