
XQuery - Filter deep child nodes for duplicates

I am trying to remove duplicates on a lower level under my elements, as they can not be processed in the system. Unfortunately without much success so far.

The XML has several <Article> childs under <Articles>. The <Article> Elements can have <UNIT> Elements. These need to be unique in the whole document, but only the <NR>/<COUNT> combination.

With the Example as followed:

            <TEXT>RANDOM Aqfwfqf</TEXT>
            <TEXT>RANDOM hrthe</TEXT>
            <TEXT>RANDOM cutrh</TEXT>
            <TEXT>RANDOM rtjrtf</TEXT>
            <TEXT>RANDOM jrtj</TEXT>
            <TEXT>RANDOM rtjrt</TEXT>

The result should look like:

            <TEXT>RANDOM Aqfwfqf</TEXT>
            <TEXT>RANDOM cutrh</TEXT>
            <TEXT>RANDOM rtjrtf</TEXT>

I tried string-join the two values in <UNIT> and then delete the nodes, but ended up deleting all of the UNIT instead of leaving one.

Getting a distinct list and count the occurences worked, but i couldn't delete the excesss nodes.

How could i reduce the quantity of the node combination to one?


  • For me, the following works:

    declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
    declare option output:method 'xml';
    declare option output:indent 'yes';
    declare context item := document {
                <TEXT>RANDOM Aqfwfqf</TEXT>
                <TEXT>RANDOM hrthe</TEXT>
                <TEXT>RANDOM cutrh</TEXT>
                <TEXT>RANDOM rtjrtf</TEXT>
                <TEXT>RANDOM jrtj</TEXT>
                <TEXT>RANDOM rtjrt</TEXT>
    . transform with {
        delete node for $unit in //UNIT 
                    group by $nr := $unit/NR, $cnt := $unit/COUNT
                    return subsequence($unit, 2)

    So this is doing it on an in memory context node, I think if you have a db document as the input doing

        delete node for $unit in //UNIT 
                    group by $nr := $unit/NR, $cnt := $unit/COUNT
                    return subsequence($unit, 2)

    would work just fine.