t-sqlxqueryxquery-sql

TSQL XML - Node attributes as columns or rows in same query


I'm working with an XML file with a node similar to this:

<Process>    
    <Step No="1" Types="D" Temp="25" Secs="6" Macro="2">Enable Mixers</Step>
    <Step No="11" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>
    <Step No="2" Types="D2" Temp="22" Secs="62" Macro="23">Enable Mixers3</Step>
</Process>

DDL:

DROP TABLE IF EXISTS MyXML2
GO
CREATE TABLE MyXML2(ID INT IDENTITY(1,1), c XML)
INSERT MyXML2(c) VALUES 
    ('<Process><Step No="1" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>
     <Step No="11" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>
     <Step No="2" Types="D2" Temp="22" Secs="62" Macro="23">Enable Mixers3</Step></Process>')
GO

I need to get a database structure like this (example given for only 1 "Step" above):

StepNumber ColumnName ColumnValue
1 Types D
1 Temp 25
1 Secs 6
1 Macro 2

My work so far: I've been able to map each attribute into a row.(Edit: updated to working example based on DDL above)

SELECT 
    col.value('local-name(.)', 'VARCHAR(50)') AS ColumnName,
    col.value('.[1]', 'VARCHAR(MAX)') AS ColumnValue
FROM [MyXML2]
CROSS APPLY [c].nodes('/Process/Step/@*') doc(col)

The output looks like:

But I need the "No" attribute to be a column. Is there a way to do this all in one query?

Fiddle: http://sqlfiddle.com/#!18/e56828/9


Solution

  • Since originally you did not provided DDL+DML, I am presents two samples. One for table which has identification column (ID in my sample) and one without (which mean that I need to add one dynamically using ROW_NUMBER)

    Demo one: When we have identification column

    -- DDL+DML : this is something that the OP should provide!!!
    DROP TABLE IF EXISTS MyXML2
    GO
    CREATE TABLE MyXML2(ID INT IDENTITY(1,1), c XML)
    INSERT MyXML2(c) VALUES 
        ('<Step No="1" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>'),
        ('<Step No="11" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>'),
        ('<Step No="2" Types="D2" Temp="22" Secs="62" Macro="23">Enable Mixers3</Step>')
    GO
    
    -- Solution
    ;With MyCTE as (
        SELECT 
            MyXML2.ID,
            doc.Col.value('local-name(.[1])','VARCHAR(100)') ColumnName,
            doc.Col.value('.[1]','VARCHAR(100)') ColumnValue
        FROM MyXML2
        CROSS APPLY MyXML2.c.nodes('/Step/@*') doc(Col)
    )
    select
        StepNumber = (SELECT MyIn.ColumnValue from MyCTE as MyIn where MyIn.ColumnName = 'No' and MyIn.ID = MyCTE.ID)
        ,ColumnName,ColumnValue
    from MyCTE
    WHERE not ColumnName = 'No'
    GO
    

    Demo two: When we do not have identification column(s)

    -- DDL+DML : this is something that the OP should provide!!!
    DROP TABLE IF EXISTS MyXML
    GO
    CREATE TABLE MyXML(c XML)
    INSERT MyXML(c) VALUES 
        ('<Step No="1" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>'),
        ('<Step No="11" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>'),
        ('<Step No="2" Types="D2" Temp="22" Secs="62" Macro="23">Enable Mixers3</Step>')
    GO
    
    -- Solution
    ;With MyCTE1 AS (SELECT RN = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), c FROM MyXML)
    , MyCTE2 as (
        SELECT 
            MyCTE1.RN,
            doc.Col.value('local-name(.[1])','VARCHAR(100)') ColumnName,
            doc.Col.value('.[1]','VARCHAR(100)') ColumnValue
        FROM MyCTE1
        CROSS APPLY MyCTE1.c.nodes('/Step/@*') doc(Col)
    )
    select
        StepNumber = (SELECT MyIn.ColumnValue from MyCTE2 as MyIn where MyIn.ColumnName = 'No' and MyIn.RN = MyCTE2.RN)
        ,ColumnName,ColumnValue
    from MyCTE2
    WHERE not ColumnName = 'No'
    GO
    

    Result as expected:

    enter image description here


    Update: 2021-12-06

    Following the new information which we got, here are some new solutions and explanation. The above should be useful for future readers which have similar question.

    So, in the above solutions I focused on a case where we have single Step node in each row in the table. According the new information we might have multiples nodes of Step in the same value. Moreover, the Step nodes are wrapped in another node name Process

    For example, a specific XML value can be: <Process><Step No="1" Types="D1" Temp="1" Secs="61" Macro="21">Enable Mixers1</Step> <Step No="11" Types="D11" Temp="11" Secs="611" Macro="21">Enable Mixers2</Step> <Step No="111" Types="D111" Temp="111" Secs="6111" Macro="23">Enable Mixers3</Step></Process>

    Demo three: using variable, Step nodes structure is unknown, multiple Step nodes

    In this demo I will resent solution based on the same approach as solution one

    declare @xml XML = '<Process><Step No="1" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>
         <Step No="11" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers2</Step>
         <Step No="2" Types="D2" Temp="22" Secs="62" Macro="23">Enable Mixers3</Step></Process>'
    
    -->>> HIGHLY recommended to un-comment below lines and check what I am using as input for the CTE in this solution
    --SELECT
    --  t.c.value('./@No', 'VARCHAR(128)') as StepNumber,
    --  t.c.query ('.') as Types
    --from @xml.nodes('Process/.[1]/*')as t(c)
        
    ;With MyCTE01 as (
        SELECT
            t.c.value('./@No', 'INT') as StepNumber,
            t.c.query ('.') as MyXML
        from @xml.nodes('Process/.[1]/*')as t(c)
    )
    SELECT 
        MyCTE01.StepNumber,
        doc.Col.value('local-name(.[1])','VARCHAR(100)') ColumnName,
        doc.Col.value('.[1]','VARCHAR(100)') ColumnValue
    FROM MyCTE01
    CROSS APPLY MyCTE01.MyXML.nodes('/Step/@*') doc(Col)
    WHERE not doc.Col.value('local-name(.[1])','VARCHAR(100)') = 'No'
    GO
    

    This solution will work for you but if the structure of the Step node is always the same - meaning you have the same attributes as in al the examples during the discussion, then we can get much much better solutions...

    Demo four: Using variable, Step nodes has a known structure, multiple Step nodes

    Since we know which attribute we have then we can hard coded use the names. In this case we do not this part which meant to find all the attributes CROSS APPLY MyCTE01.MyXML.nodes('/Step/@*')

    We can use totally different approach, directly getting the values of the know attributes and using UNPIVOT. This solution provide much better performance but it is less flexible then solutions three.

    declare @xml XML = '<Process><Step No="1" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>
         <Step No="11" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers2</Step>
         <Step No="2" Types="D2" Temp="22" Secs="62" Macro="23">Enable Mixers3</Step></Process>'
    
    --select 
    --  t.c.value('./@No', 'VARCHAR(128)') as id,
    --  t.c.value('./@Types', 'VARCHAR(128)') as Types,
    --  t.c.value('./@Temp', 'VARCHAR(128)') as Temp,
    --  t.c.value('./@Secs', 'VARCHAR(128)') as Secs,
    --  t.c.value('./@Macro', 'VARCHAR(128)') as Macro,
    --  t.c.value('./@Macro', 'VARCHAR(128)') as Macro
    --from @xml.nodes('Process/.[1]/*')as t(c)
    
    SELECT StepNumber, Column_Name, Column_Value
    FROM(
        select 
            t.c.value('./@No', 'VARCHAR(128)')    as StepNumber,
            t.c.value('./@Types', 'VARCHAR(128)') as Types,
            t.c.value('./@Temp', 'VARCHAR(128)')  as Temp,
            t.c.value('./@Secs', 'VARCHAR(128)')  as Secs,
            t.c.value('./@Macro', 'VARCHAR(128)') as Macro
        from @xml.nodes('Process/.[1]/*')as t(c)
       ) p  
    UNPIVOT  
       (Column_Value FOR Column_Name IN (Types, Temp, Secs, Macro)  )AS unpvt;  
    GO
    

    Note! You can use this approach for unknown structure as well if you use dynamic queries and first find the attributes in the XML.

    Demo five: Using variable, Step nodes has a known structure, multiple Step nodes

    This solution has the same limitation as solution four (known structure) but in addition it only fits when we are working on single value like a variable. Therefore, if we want to implement it on table then we might need to loop all rows which might reduce performance dramatically. But when this solution fits the needs then it should provide best performance!

    /***BEST SOLUTION - if fits the needs***/
    -- XML to Tabular using OPENXML
    DECLARE @idoc INT, @xml XML = '<Process><Step No="1" Types="D1" Temp="1" Secs="61" Macro="21">Enable Mixers1</Step>
         <Step No="11" Types="D11" Temp="11" Secs="611" Macro="21">Enable Mixers2</Step>
         <Step No="111" Types="D111" Temp="111" Secs="6111" Macro="23">Enable Mixers3</Step></Process>'
    --Create an internal representation of the XML document.  
    -- Reads the XML text -> parses the text by using the MSXML parser -> and provides the parsed document in a state ready for consumption. 
    EXEC sp_xml_preparedocument @idoc OUTPUT, @xml;
    
    --SELECT 
    --  No    as StepNumber,
    --  Types as Types,
    --  Temp  as Temp,
    --  Secs  as Secs,
    --  Macro as Macro,
    --  NoteValue
    --FROM OPENXML (@idoc, '/Process/Step')  
    --  WITH (
    --      -- When OPENXML does not have input of third parameter then we can choose if this will atribute or node
    --      -- usig '@No' will bring the value of atribute and using 'No' will bring the value of node
    --      No        INT          '@No'   ,
    --      Types     VARCHAR(128) '@Types',  
    --      Temp      VARCHAR(128) '@Temp' ,  
    --      Secs      VARCHAR(128) '@Secs' ,  
    --      Macro     VARCHAR(128) '@Macro', 
    --      NoteValue VARCHAR(128) '.'
    --  ) 
    
    SELECT StepNumber, Column_Name, Column_Value
    FROM(
        SELECT 
        No    as StepNumber,
        Types as Types,
        Temp  as Temp,
        Secs  as Secs,
        Macro as Macro
        FROM OPENXML (@idoc, '/Process/Step',1)  
            WITH (
                No    INT,
                Types VARCHAR(128),  
                Temp  VARCHAR(128),  
                Secs  VARCHAR(128),  
                Macro VARCHAR(128)
            ) 
       ) p  
    UNPIVOT  
       (Column_Value FOR Column_Name IN (Types, Temp, Secs, Macro)  )AS unpvt;  
    --sp_xml_removedocument free's up the memory.  
    EXEC sp_xml_removedocument @idoc   
    GO
    

    So... we have multiple approaches whic fits different case... but we still need to think about tables...

    Demo six: Using table, Step nodes has unknown structure, multiple Step nodes

    You can implement Demo four if this fit (known structure or using dynamic query), but for the last demo I will implement Demo three approach on a case that we have multiple rows in a table which each row includes XML that has multiple Step nodes

    DROP TABLE IF EXISTS MyXML_Tbl
    GO
    CREATE TABLE MyXML_Tbl(ID INT IDENTITY(1,1), MyXML XML)
    GO
    INSERT MyXML_Tbl(MyXML) VALUES 
        ('<Process><Step No="1" Types="D1" Temp="1" Secs="61" Macro="21">Enable Mixers1</Step>
         <Step No="11" Types="D11" Temp="11" Secs="611" Macro="21">Enable Mixers1</Step>
         <Step No="111" Types="D111" Temp="111" Secs="6111" Macro="23">Enable Mixers3</Step></Process>')
    INSERT MyXML_Tbl(MyXML) VALUES 
        ('<Process><Step No="2" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>
         <Step No="22" Types="D1" Temp="11" Secs="61" Macro="21">Enable Mixers1</Step>
         <Step No="222" Types="D2" Temp="22" Secs="62" Macro="23">Enable Mixers3</Step></Process>')
    GO
    
    --SELECT * FROM MyXML_Tbl
    --GO
    
    --SELECT
    --  tb.ID,
    --  tx.c.value('./@No', 'VARCHAR(128)') as StepNumber,
    --  tx.c.query ('.') as Types
    --from MyXML_Tbl tb
    --CROSS APPLY tb.MyXML.nodes('Process/.[1]/*')as tx(c)
    
    ;With MyCTE01 as (
        SELECT
            tb.ID,
            tx.c.value('./@No', 'VARCHAR(128)') as StepNumber,
            tx.c.query ('.') as MyXML
        from MyXML_Tbl tb
        CROSS APPLY tb.MyXML.nodes('Process/.[1]/*')as tx(c)
    )
    SELECT 
        MyCTE01.id,
        MyCTE01.StepNumber,
        doc.Col.value('local-name(.[1])','VARCHAR(100)') ColumnName,
        doc.Col.value('.[1]','VARCHAR(100)') ColumnValue
    FROM MyCTE01
    CROSS APPLY MyCTE01.MyXML.nodes('/Step/@*') doc(Col)
    WHERE not doc.Col.value('local-name(.[1])','VARCHAR(100)') = 'No'
    GO
    

    I hope this is useful. It should cover all cases mentioned in the discussion