Working on XQuery in SQL Server. My understanding was that using CROSS/OUTER APPLY for the nodes would help improve performance, however the top query (below) performance SIGNIFICANTLY worse than the lower. Can anyone help explain why that is, and perhaps any pointers on XQuery performance? I've searched all of the questions I can find but nothing seems directly on point.
Poor Performer
SELECT x.ApplicationId
, t.value('(reportId/text())[1]','varchar(100)') AS ReportId
, t.value('(reportType/text())[1]','varchar(100)') AS ReportType
, t.value('(tracking-number/text())[1]','varchar(50)') AS TrackingNumber
, n.value('(firstName/text())[1]','varchar(100)') AS FirstName
, n.value('(middleName/text())[1]','varchar(100)') AS MiddleInitial
, n.value('(lastName/text())[1]','varchar(100)') AS LastName
, ssn.value('(ssn/text())[1]','varchar(50)') AS SSN
, dob.value('(dob/text())[1]','varchar(30)') AS DateOfBirth
FROM #xml x
CROSS APPLY x.xmlResponse.nodes('/xml-response') t1(t)
OUTER APPLY t1.t.nodes('personPii/applicantInformation') t2(ai)
OUTER APPLY t2.ai.nodes('name') t3(n)
OUTER APPLY t2.ai.nodes('ssn') t4(ssn)
OUTER APPLY t2.ai.nodes('dob') t5(dob)
Better Performer -why?
SELECT x.ApplicationId
, x.XMLResponse.value('(/xml-response/reportId)[1]','varchar(100)') AS ReportId
, x.XMLResponse.value('(/xml-response/reportType)[1]','varchar(100)') AS ReportType
, x.XMLResponse.value('(/xml-response/tracking-number)[1]','varchar(50)') AS TrackingNumber
, x.XMLResponse.value('(/xml-response/personPii/applicantInformation/firstName)[1]','varchar(100)') AS FirstName
, x.XMLResponse.value('(/xml-response/personPii/applicantInformation/middleName)[1]','varchar(100)') AS MiddleInitial
, x.XMLResponse.value('(/xml-response/personPii/applicantInformation/lastName)[1]','varchar(100)') AS LastName
, x.XMLResponse.value('(/xml-response/personPii/applicantInformation/ssn/ssn)[1]','varchar(50)') AS SSN
, x.XMLResponse.value('(/xml-response/personPii/applicantInformation/dob/dob)[1]','varchar(30)') AS DateOfBirth
FROM #xml x
Adding an example of nodes vs simple xml:
create table #data (xml xml)
insert into #data
select '<root><data><field>X</field><field>Y</field></data></root>'
select xml.value('root[1]/data[1]/field[1]', 'nvarchar(max)')
from #data d
select n.value('(text())[1]', 'nvarchar(max)')
from #data d
cross apply xml.nodes('root/data/field') t(n)
Here, you can see the difference how nodes work. The first version cannot simply fetch more than one field value