xquerymarklogicmarklogic-corb

Find newest document in MarkLogic temporal stack even if deleted


For a CoRB collector, I need to find the newest document in their temporal stack that are not in latest. e.g. given the documents:

URI       Collections
A.xml     A.xml
A.1.xml   A.xml
A.2.xml   A.xml
B.xml     B.xml latest
B.1.xml   B.xml
C.xml     C.xml

I need a fast way to return A.xml and C.xml (but not B.xml)

The best I’ve come up with is to get a list of the main URIs (A.xml B.xml C.xml) and loop over them and then comparing the collections with the uri name. This is incredibly slow though.

Alternatively, I could create 2 corb processes; the first process to build the URIS_FILE by threading filtering and feeding that into a separate corb process. However, this adds a lot of complexity.

Is there any built-in way to achieve this?


Solution

  • You could use map intersection to find those URIs.

    Both cts:uris() and cts:collections() have an option to get the results as a map. So, if you query for the URIs and the Collections that are not in the latest collection, and then look for the intersection - the URIs and temporal collection for that URI that match should produce those URIs. You can snag those from the resulting map with map:keys()

    Your CoRB URIs module would be:

    let $not-latest-query := cts:not-query(cts:collection-query("latest"))
    let $uris := cts:uris("", "map", $not-latest-query)
    let $collections := cts:collections("", "map", $not-latest-query)
    let $most-recent-historical := map:keys($uris * $collections)
    return (count($most-recent-historical), $most-recent-historical)