sql-serverxmlsql-server-openxml

Select certain repeating XML node by name at various levels & repeating sub-nodes as rows from within said node


Take the following simplified XML:

(An excerpt from a very expensive (and large) XML product feed from an 3rd party industry wholesaler data provider - i.e. I have no control over its schema / format / content)

<Company>
    <Code>7786</Code>
    <Brand>
        <!-- /../ -->
        <Groups>
            <Group>
                <!-- /../ -->
                <Group>
                    <!-- /../ -->
                    <Group>
                        <!-- /../ -->
                        <Product>
                            <Pip_code>3623949</Pip_code>
                            <!-- /../ -->
                            <Other_Codes>
                                <Other_Code>
                                    <Code_Description>EAN</Code_Description>
                                    <Code_Value>5013158781351</Code_Value>
                                </Other_Code>
                                <Other_Code>
                                    <Code_Description>shipper EAN</Code_Description>
                                    <Code_Value>503158781443</Code_Value>
                                </Other_Code>
                                <Other_Code>
                                    <Code_Description>AMPP</Code_Description>
                                    <Code_Value>19192411000001107</Code_Value>
                                </Other_Code>
                                <Other_Code>
                                    <Code_Description>AMPP Manf</Code_Description>
                                    <Code_Value>2061801000001104</Code_Value>
                                </Other_Code>
                            </Other_Codes>
                        </Product>
                    </Group>
                </Group>
            </Group>
        </Groups>
    </Brand>
    <Brand>
        <!-- /../ -->
        <Groups>
            <Group>
                <!-- /../ -->
                <Product>
                    <Pip_code>3265725</Pip_code>
                    <!-- /../ -->
                    <Other_Codes>
                        <Other_Code>
                            <Code_Description>Outer EAN</Code_Description>
                            <Code_Value>5013158776531</Code_Value>
                        </Other_Code>
                        <Other_Code>
                            <Code_Description>AMPP</Code_Description>
                            <Code_Value>11521811000001106</Code_Value>
                        </Other_Code>
                        <Other_Code>
                            <Code_Description>AMPP Manf</Code_Description>
                            <Code_Value>2061801000001104</Code_Value>
                        </Other_Code>
                        <Other_Code>
                            <Code_Description>EAN</Code_Description>
                            <Code_Value>5013158776500</Code_Value>
                        </Other_Code>
                    </Other_Codes>
                </Product>
            </Group>
        </Groups>
    </Brand>
</Company>

I have been tasked with producing the following output:

Target output

This is proving to be extremely difficult for two reasons:

  1. You will notice that <product> 3623949 exists within Groups/Group/Group/Group/ (3 levels of Group) whereas <product> 3265725 exists within Groups/Group/ (1 level of Group) - I need to select all <product> nodes regardless of their nesting level (ideally whenever parent node = <Group>)

  2. Note the <Other_Codes> node within <product> - I need to select the repeating child nodes (Code_Description & Code_Value) as two seperate columns, but note also that there could be any number of <Other_Code> nodes within <Other_Codes> and I need to select them all (rendering [i] notation useless)

I've tried my best to get this data using TSQL OPENXML, but the only way I can think of doing it requires me to duplicate code for the various Group/Group/ levels and I also have to hard code a set number of occurrences of Other_Code using [0] notation - which given their variable nature; is wrong.

SELECT  *
FROM    OPENXML (@idoc, 'Company/Brand/Groups/Group/Product',2)
                        WITH (
                          PIP_code CHAR (20) 'Pip_code',
                          OtherCodeType CHAR (20) 'Other_Codes/Other_Code[1]/Code_Description',
                          OtherCodeValue CHAR (30) 'Other_Codes/Other_Code[1]/Code_Value',
                          OtherCodeType2 CHAR (20) 'Other_Codes/Other_Code[2]/Code_Description',
                          OtherCodeValue2 CHAR (30) 'Other_Codes/Other_Code[2]/Code_Value',
                          OtherCodeType3 CHAR (20) 'Other_Codes/Other_Code[3]/Code_Description',
                          OtherCodeValue3 CHAR (30) 'Other_Codes/Other_Code[3]/Code_Value'
                  )

We would prefer to do this in TSQL (as this XML file is already being processed there for other nodes & paths not relevant here), we have SQL Server 2008, 2008 R2 & SQL Server 2014 at our disposal, but non-SQL solutions would also be helpful - we really have hit a brick wall on this one.


Solution

  • So what you really want to do is have your selection statement be //Other_Codes - that double forward slash says you want every Other_Codes node, regardless of its location in the hierarchy, giving you one row for each Other_Codes node. Then, you can specify that PipCode is ../../Pip_code, assuming of course this hierarchy doesn't change all the time for this portion as well (but how would that even semantically make sense?).

    This:

    DECLARE @doc xml = '<Company>
        <Code>7786</Code>
        <Brand>
            <!-- /../ -->
            <Groups>
                <Group>
                    <!-- /../ -->
                    <Group>
                        <!-- /../ -->
                        <Group>
                            <!-- /../ -->
                            <Product>
                                <Pip_code>3623949</Pip_code>
                                <!-- /../ -->
                                <Other_Codes>
                                    <Other_Code>
                                        <Code_Description>EAN</Code_Description>
                                        <Code_Value>5013158781351</Code_Value>
                                    </Other_Code>
                                    <Other_Code>
                                        <Code_Description>shipper EAN</Code_Description>
                                        <Code_Value>503158781443</Code_Value>
                                    </Other_Code>
                                    <Other_Code>
                                        <Code_Description>AMPP</Code_Description>
                                        <Code_Value>19192411000001107</Code_Value>
                                    </Other_Code>
                                    <Other_Code>
                                        <Code_Description>AMPP Manf</Code_Description>
                                        <Code_Value>2061801000001104</Code_Value>
                                    </Other_Code>
                                </Other_Codes>
                            </Product>
                        </Group>
                    </Group>
                </Group>
            </Groups>
        </Brand>
        <Brand>
            <!-- /../ -->
            <Groups>
                <Group>
                    <!-- /../ -->
                    <Product>
                        <Pip_code>3265725</Pip_code>
                        <!-- /../ -->
                        <Other_Codes>
                            <Other_Code>
                                <Code_Description>Outer EAN</Code_Description>
                                <Code_Value>5013158776531</Code_Value>
                            </Other_Code>
                            <Other_Code>
                                <Code_Description>AMPP</Code_Description>
                                <Code_Value>11521811000001106</Code_Value>
                            </Other_Code>
                            <Other_Code>
                                <Code_Description>AMPP Manf</Code_Description>
                                <Code_Value>2061801000001104</Code_Value>
                            </Other_Code>
                            <Other_Code>
                                <Code_Description>EAN</Code_Description>
                                <Code_Value>5013158776500</Code_Value>
                            </Other_Code>
                        </Other_Codes>
                    </Product>
                </Group>
            </Groups>
        </Brand>
    </Company>';
    
    DECLARE @idoc int;
    
    exec sp_xml_preparedocument @idoc OUTPUT, @doc;
    SELECT  *
    FROM    OPENXML(@idoc, '//Other_Code',1)
    WITH (
        PipCode CHAR (20) '../../Pip_code',
        Code_Description CHAR(20) 'Code_Description',
        Code_Value CHAR (30) 'Code_Value'
    )
    

    Produces this:

    PipCode                 Code_Description        Code_Value
    -----------------------------------------------------------------
    3623949                 EAN                     5013158781351                 
    3623949                 shipper EAN             503158781443                  
    3623949                 AMPP                    19192411000001107             
    3623949                 AMPP Manf               2061801000001104              
    3265725                 Outer EAN               5013158776531                 
    3265725                 AMPP                    11521811000001106             
    3265725                 AMPP Manf               2061801000001104              
    3265725                 EAN                     5013158776500