Take the following simplified XML:
(An excerpt from a very expensive (and large) XML product feed from an 3rd party industry wholesaler data provider - i.e. I have no control over its schema / format / content)
<Company>
<Code>7786</Code>
<Brand>
<!-- /../ -->
<Groups>
<Group>
<!-- /../ -->
<Group>
<!-- /../ -->
<Group>
<!-- /../ -->
<Product>
<Pip_code>3623949</Pip_code>
<!-- /../ -->
<Other_Codes>
<Other_Code>
<Code_Description>EAN</Code_Description>
<Code_Value>5013158781351</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>shipper EAN</Code_Description>
<Code_Value>503158781443</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP</Code_Description>
<Code_Value>19192411000001107</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP Manf</Code_Description>
<Code_Value>2061801000001104</Code_Value>
</Other_Code>
</Other_Codes>
</Product>
</Group>
</Group>
</Group>
</Groups>
</Brand>
<Brand>
<!-- /../ -->
<Groups>
<Group>
<!-- /../ -->
<Product>
<Pip_code>3265725</Pip_code>
<!-- /../ -->
<Other_Codes>
<Other_Code>
<Code_Description>Outer EAN</Code_Description>
<Code_Value>5013158776531</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP</Code_Description>
<Code_Value>11521811000001106</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP Manf</Code_Description>
<Code_Value>2061801000001104</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>EAN</Code_Description>
<Code_Value>5013158776500</Code_Value>
</Other_Code>
</Other_Codes>
</Product>
</Group>
</Groups>
</Brand>
</Company>
I have been tasked with producing the following output:
This is proving to be extremely difficult for two reasons:
You will notice that <product>
3623949 exists within Groups/Group/Group/Group/
(3 levels of Group) whereas <product>
3265725 exists within Groups/Group/
(1 level of Group) - I need to select all <product>
nodes regardless of their nesting level (ideally whenever parent node = <Group>
)
Note the <Other_Codes>
node within <product>
- I need to select the repeating child nodes (Code_Description & Code_Value) as two seperate columns, but note also that there could be any number of <Other_Code>
nodes within <Other_Codes>
and I need to select them all (rendering [i] notation useless)
I've tried my best to get this data using TSQL OPENXML
, but the only way I can think of doing it requires me to duplicate code for the various Group/Group/ levels and I also have to hard code a set number of occurrences of Other_Code using [0] notation - which given their variable nature; is wrong.
SELECT *
FROM OPENXML (@idoc, 'Company/Brand/Groups/Group/Product',2)
WITH (
PIP_code CHAR (20) 'Pip_code',
OtherCodeType CHAR (20) 'Other_Codes/Other_Code[1]/Code_Description',
OtherCodeValue CHAR (30) 'Other_Codes/Other_Code[1]/Code_Value',
OtherCodeType2 CHAR (20) 'Other_Codes/Other_Code[2]/Code_Description',
OtherCodeValue2 CHAR (30) 'Other_Codes/Other_Code[2]/Code_Value',
OtherCodeType3 CHAR (20) 'Other_Codes/Other_Code[3]/Code_Description',
OtherCodeValue3 CHAR (30) 'Other_Codes/Other_Code[3]/Code_Value'
)
We would prefer to do this in TSQL (as this XML file is already being processed there for other nodes & paths not relevant here), we have SQL Server 2008, 2008 R2 & SQL Server 2014 at our disposal, but non-SQL solutions would also be helpful - we really have hit a brick wall on this one.
So what you really want to do is have your selection statement be //Other_Codes
- that double forward slash says you want every Other_Codes
node, regardless of its location in the hierarchy, giving you one row for each Other_Codes
node. Then, you can specify that PipCode
is ../../Pip_code
, assuming of course this hierarchy doesn't change all the time for this portion as well (but how would that even semantically make sense?).
This:
DECLARE @doc xml = '<Company>
<Code>7786</Code>
<Brand>
<!-- /../ -->
<Groups>
<Group>
<!-- /../ -->
<Group>
<!-- /../ -->
<Group>
<!-- /../ -->
<Product>
<Pip_code>3623949</Pip_code>
<!-- /../ -->
<Other_Codes>
<Other_Code>
<Code_Description>EAN</Code_Description>
<Code_Value>5013158781351</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>shipper EAN</Code_Description>
<Code_Value>503158781443</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP</Code_Description>
<Code_Value>19192411000001107</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP Manf</Code_Description>
<Code_Value>2061801000001104</Code_Value>
</Other_Code>
</Other_Codes>
</Product>
</Group>
</Group>
</Group>
</Groups>
</Brand>
<Brand>
<!-- /../ -->
<Groups>
<Group>
<!-- /../ -->
<Product>
<Pip_code>3265725</Pip_code>
<!-- /../ -->
<Other_Codes>
<Other_Code>
<Code_Description>Outer EAN</Code_Description>
<Code_Value>5013158776531</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP</Code_Description>
<Code_Value>11521811000001106</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP Manf</Code_Description>
<Code_Value>2061801000001104</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>EAN</Code_Description>
<Code_Value>5013158776500</Code_Value>
</Other_Code>
</Other_Codes>
</Product>
</Group>
</Groups>
</Brand>
</Company>';
DECLARE @idoc int;
exec sp_xml_preparedocument @idoc OUTPUT, @doc;
SELECT *
FROM OPENXML(@idoc, '//Other_Code',1)
WITH (
PipCode CHAR (20) '../../Pip_code',
Code_Description CHAR(20) 'Code_Description',
Code_Value CHAR (30) 'Code_Value'
)
Produces this:
PipCode Code_Description Code_Value
-----------------------------------------------------------------
3623949 EAN 5013158781351
3623949 shipper EAN 503158781443
3623949 AMPP 19192411000001107
3623949 AMPP Manf 2061801000001104
3265725 Outer EAN 5013158776531
3265725 AMPP 11521811000001106
3265725 AMPP Manf 2061801000001104
3265725 EAN 5013158776500