sql-serverxmlxml-parsingxqueryxquery-sql

Parse XML where data is pivoted


I have the following XML data

<browse result="1">
  <th>
    <td label="Company ID"></td>
    <td label="Company Name"></td>
    <td label="Country"></td>
    <td label="Region"></td>
  </th>
  <tr>
    <td>ABC01</td>
    <td>Company 1</td>
    <td>United States</td>
    <td>North America</td>
  </tr>
  <tr>
    <td>ABC02</td>
    <td>Company 2</td>
    <td>China</td>
    <td>Asia</td>
  </tr>
</browse>

my final table should look like this

Company ID Company Name Country Region
ABC01 Company 1 United States North America
ABC02 Company 2 China Asia

or its pivoted version

label value1 value2
Company ID ABC01 ABC02
Company Name Company 1 Company 2
Country United States China
Region North America Asia

This is what I have tried so far

declare @xmldata nvarchar(4000) = '<browse result="1">
  <th>
    <td label="Company ID"></td>
    <td label="Company Name"></td>
    <td label="Country"></td>
    <td label="Region"></td>
  </th>
  <tr>
    <td>ABC01</td>
    <td>Company 1</td>
    <td>United States</td>
    <td>North America</td>
  </tr>
  <tr>
    <td>ABC02</td>
    <td>Company 2</td>
    <td>China</td>
    <td>Asia</td>
  </tr>
</browse>'

select
     a.value('@label', 'varchar(max)') as label
    ,b.value('.', 'nvarchar(max)') as value
from (select top (1) CAST(REPLACE(CAST(@xmldata AS NVARCHAR(MAX)),'utf-8','utf-16') AS XML) as xmldata) as s
    outer apply s.xmldata.nodes('/browse/th/td') as l(a)
    outer apply s.xmldata.nodes('/browse/tr/td') as v(b)

but it gives me 32 rows, and most of them is wrongly duplicated combinations of the 4 labels with the values. How should I accomplish this?


Solution

  • If we can assume that the values are in static positions then you only need to look at the tr nodes, and then get each td node from that using their ordinal positions:

    DECLARE @xml xml = '<browse result="1">
      <th>
        <td label="Company ID"></td>
        <td label="Company Name"></td>
        <td label="Country"></td>
        <td label="Region"></td>
      </th>
      <tr>
        <td>ABC01</td>
        <td>Company 1</td>
        <td>United States</td>
        <td>North America</td>
      </tr>
      <tr>
        <td>ABC02</td>
        <td>Company 2</td>
        <td>China</td>
        <td>Asia</td>
      </tr>
    </browse>';
    
    SELECT b.th.value('(td/text())[1]','varchar(20)') AS CompanyID,
           b.th.value('(td/text())[2]','varchar(20)') AS CompanyName,
           b.th.value('(td/text())[3]','varchar(20)') AS Country,
           b.th.value('(td/text())[4]','varchar(20)') AS Region
    FROM @xml.nodes('browse/tr') b(th);
    

    You'll likely need to adjust the data types, but I very much doubt that varchar(MAX) is appropriate; if you are using IDs that are over 8,000 characters long then you have a problem.