I have a table on a SQL Server 2014 database that stores auditing information for record changes in a VARCHAR(MAX)
column (poor man's CDC.)
This data is in following format:
<span class="fieldname">Assigned To</span>
changed from <span class="oldvalue">user1</span>
to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
changed from <span class="oldvalue">QA</span>
to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
changed from <span class="oldvalue">Yes</span>
to <span class="newvalue">No</span><br />
...
I need to parse that information in order to retrieve the data transposed so that it looks like so:
Record FieldName OldValue NewValue
------ --------- -------- --------
1234 Assigned To user1 user2
1234 Status QA Development
1234 Progress Yes No
The stored procedure attempts to do this by converting the data to XML and then using XPath retrieving the necessary pieces:
;WITH TT AS (
SELECT TransId,
CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
FROM dbo.Trans
WHERE TransDate >= '11/1/2016'
AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))
SELECT TransId,
TransXml,
FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
INTO #tmp
FROM TT
CROSS APPLY TT.TransXml.nodes('root/rec') T(V);
Here is the execution plan: https://www.brentozar.com/pastetheplan/?id=rJF2GRB7g
The corresponding IO stats:
Table 'Trans'. Scan count 9, logical reads 27429, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 2964994, physical reads 0, read-ahead reads 0, lob logical reads 2991628, lob physical reads 0, lob read-ahead reads 0.
This query is excruciatingly slow (the example was for just 10 days' worth of data,) and gets progressively slower with more data.
What are my options for tuning this query?
What you really need to speed things up is some xml indexing. However, since you are creating the XML on the fly, this isn't happening. Effectively, this is the broad equivalent of a CROSS JOIN, and will get exponentially slower as time goes by.
See cross apply xml query performs exponentially worse as xml document grows for a detailed discussion and how indexing helps. If you want to do this via XML, you really will need to store XML so you can index the XML.