sqlsql-server

Remove specific data along with the angle brackets in SQL


I need to remove specific data from a SQL column along with data, I can't get the right approach.

Example input:

<div></div><div><div><div><div><p>12345</p></div></div></div></div>
<div><div><span>45678<span></span></span></div></div>
<div><div>Test No.</div></div><div><div><div><div><p>4466</p></div></div></div></div>
<div>6789</div>
<p></p><p><i>As per the attachment, no provided because we were supposed to be onboarded.</i></p>

Desired output:

12345
45678
Test No.4466
6789
As per the attachment, no provided because we were supposed to be onboarded.

Column data type is varchar NULL and I have tried the below but not working

Select Col, REPLACE(col, SUBSTRING(col, CHARINDEX('<', col), LEN(col) - CHARINDEX('>', REVERSE(col)) - CHARINDEX('<', col) + 2), '') 
from table

Just wondering don't we have straight code in SQL Server in order to remove <></> angle brackets and it's value and keep the rest?


Solution

  • Assuming your HTML is also valid XML, you can just use the following code to get all text nodes and concatenate them together.

    SELECT
      t.id,
      t.xhtml,
      t.xhtml.query('//text()').value('text()[1]', 'nvarchar(max)') AS value
    FROM t;
    

    .query('//text()') gets all the text nodes, you can use any XQuery filter to get whatever nodes you want. Then .value('text()[1]', 'nvarchar(max)') concats them all together.

    db<>fiddle