sql-serverxmlt-sql

Order of XML nodes from document preserved in insert?


If I do:

INSERT INTO dst
SELECT blah
FROM src
CROSS APPLY xmlcolumn.nodes('blah')

where dst has an identity column, can one say for certain that the identity column order matches the order of the nodes from the original XML document?

I think the answer is no, there are no guarantees and that to ensure the ordering is able to be retained, some ordering information needs to also be extracted from the XML at the same time the nodes are enumerated.


Solution

  • There's no way to see it explicitly in an execution plan, but the id column returned by the nodes() method is a varbinary(900) OrdPath, which does encapsulate the original xml document order.

    The solution offered by Mikael Eriksson on the related question Does the `nodes()` method keep the document order? relies on the OrdPath to provide an ORDER BY clause necessary to determine how identity values are assigned for the INSERT.

    A slightly more compact usage follows:

    CREATE TABLE #T 
    (
        ID integer IDENTITY, 
        Fruit nvarchar(10) NOT NULL
    );
    
    DECLARE @xml xml = 
        N'
        <Fruits>
          <Apple />
          <Banana />
          <Orange />
          <Pear />
        </Fruits>
        ';
    
    INSERT #T 
        (Fruit)
    SELECT 
        N.n.value('local-name(.)', 'nvarchar(10)') 
    FROM @xml.nodes('/Fruits/*') AS N (n)
    ORDER BY
        ROW_NUMBER() OVER (ORDER BY N.n);
    
    SELECT 
        T.ID, 
        T.Fruit 
    FROM #T AS T
    ORDER BY
        T.ID;
    

    db<>fiddle

    Using the OrdPath this way is presently undocumented, but the technique is sound in principle:

    1. The OrdPath reflects document order.
    2. The ROW_NUMBER computes sequence values ordered by OrdPath*.
    3. The ORDER BY clause uses the row number sequence.
    4. Identity values are assigned to rows as per the ORDER BY.

    To be clear, this holds even if parallelism is employed. As Mikael says, the dubious aspect is using id in the ROW_NUMBER since id is not documented to be the OrdPath.


    * The ordering is not shown in plans, but optimizer output using TF 8607 contains:

    ScaOp_SeqFunc row_number order[CALC:QCOL: XML Reader with XPath filter.id ASC]