XQuery - Filter deep child nodes for duplicates

I am trying to remove duplicates on a lower level under my elements, as they can not be processed in the system. Unfortunately without much success so far.

The XML has several <Article> childs under <Articles>. The <Article> Elements can have <UNIT> Elements. These need to be unique in the whole document, but only the <NR>/<COUNT> combination.

With the Example as followed:

<Articles>
    <Article>
        <A1>123</A1>
        <A2>456</A2>
        <UNIT>
            <NR>59</NR>
            <COUNT>3</COUNT>
            <TEXT>RANDOM Aqfwfqf</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>3</COUNT>
            <TEXT>RANDOM hrthe</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>59</COUNT>
            <TEXT>RANDOM cutrh</TEXT>
        </UNIT>
    </Article>
    <Article>
        <A1>351</A1>
        <A2>362</A2>
        <UNIT>
            <NR>59</NR>
            <COUNT>4</COUNT>
            <TEXT>RANDOM rtjrtf</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>3</COUNT>
            <TEXT>RANDOM jrtj</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>59</COUNT>
            <TEXT>RANDOM rtjrt</TEXT>
        </UNIT>
    </Article>
</Articles>

The result should look like:

<Articles>
    <Article>
        <A1>123</A1>
        <A2>456</A2>
        <UNIT>
            <NR>59</NR>
            <COUNT>3</COUNT>
            <TEXT>RANDOM Aqfwfqf</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>59</COUNT>
            <TEXT>RANDOM cutrh</TEXT>
        </UNIT>
    </Article>
    <Article>
        <A1>351</A1>
        <A2>362</A2>
        <UNIT>
            <NR>59</NR>
            <COUNT>4</COUNT>
            <TEXT>RANDOM rtjrtf</TEXT>
        </UNIT>
    </Article>
</Articles>

I tried string-join the two values in <UNIT> and then delete the nodes, but ended up deleting all of the UNIT instead of leaving one.

Getting a distinct list and count the occurences worked, but i couldn't delete the excesss nodes.

How could i reduce the quantity of the node combination to one?

Solution

For me, the following works:

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method 'xml';
declare option output:indent 'yes';

declare context item := document {
<Articles>
    <Article>
        <A1>123</A1>
        <A2>456</A2>
        <UNIT>
            <NR>59</NR>
            <COUNT>3</COUNT>
            <TEXT>RANDOM Aqfwfqf</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>3</COUNT>
            <TEXT>RANDOM hrthe</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>59</COUNT>
            <TEXT>RANDOM cutrh</TEXT>
        </UNIT>
    </Article>
    <Article>
        <A1>351</A1>
        <A2>362</A2>
        <UNIT>
            <NR>59</NR>
            <COUNT>4</COUNT>
            <TEXT>RANDOM rtjrtf</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>3</COUNT>
            <TEXT>RANDOM jrtj</TEXT>
        </UNIT>
        <UNIT>
            <NR>59</NR>
            <COUNT>59</COUNT>
            <TEXT>RANDOM rtjrt</TEXT>
        </UNIT>
    </Article>
</Articles>
};


. transform with {
    delete node for $unit in //UNIT 
                group by $nr := $unit/NR, $cnt := $unit/COUNT
                return subsequence($unit, 2)
  }

So this is doing it on an in memory context node, I think if you have a db document as the input doing

    delete node for $unit in //UNIT 
                group by $nr := $unit/NR, $cnt := $unit/COUNT
                return subsequence($unit, 2)

would work just fine.