xmlxqueryduplicate-datamarklogic

How do I find duplicate data in xml document using XQuery?


I have a bunch of documents in a MarkLogic xml database. One document has:

<colors>
  <color>red</color>
  <color>red</color>
</colors>

Having multiple colors is not a problem. Having multiple colors that are both red is a problem. How do I find the documents that have duplicate data?


Solution

  • Everything MarkLogic returns is just a sequence of nodes, so we can count the sequence size of the whole and compare it to the count of the sequence of distinct values. If they're not distinct, they're duplicate, and you have your subset.

    for $c in doc()//colors
    where fn:count($c/color) != fn:count(fn:distinct-values($c/color))
    return $c