For a CoRB collector, I need to find the newest document in their temporal stack that are not in latest. e.g. given the documents:
URI Collections
A.xml A.xml
A.1.xml A.xml
A.2.xml A.xml
B.xml B.xml latest
B.1.xml B.xml
C.xml C.xml
I need a fast way to return A.xml
and C.xml
(but not B.xml
)
The best I’ve come up with is to get a list of the main URIs (A.xml B.xml C.xml
) and loop over them and then comparing the collections with the uri name. This is incredibly slow though.
Alternatively, I could create 2 corb processes; the first process to build the URIS_FILE by threading filtering and feeding that into a separate corb process. However, this adds a lot of complexity.
Is there any built-in way to achieve this?
You could use map intersection to find those URIs.
Both cts:uris()
and cts:collections()
have an option to get the results as a map
. So, if you query for the URIs and the Collections that are not in the latest
collection, and then look for the intersection - the URIs and temporal collection for that URI that match should produce those URIs. You can snag those from the resulting map with map:keys()
Your CoRB URIs module would be:
let $not-latest-query := cts:not-query(cts:collection-query("latest"))
let $uris := cts:uris("", "map", $not-latest-query)
let $collections := cts:collections("", "map", $not-latest-query)
let $most-recent-historical := map:keys($uris * $collections)
return (count($most-recent-historical), $most-recent-historical)