xquerymarklogicmarklogic-optic-api

Using Modifier Functions after the op:reduce() Function in an Optic Access Plan


Is it possible to use optic modifier functions like op:group-by(), op:where(), etc. after running the op:reduce() function? I haven't been able to find anything in the Optic API Dev Guide regarding this question. We have some data stored in TDEs that contains an effective date. At query time we pass an op:where() filter to find rows that are active and "in effect" so to speak. For the "parent" it is technically possible to have two rows "in effect". The business requirement for this case is to get the first row after sorting the rows by the effective date in descending order (i.e. get the most recent "in effect").

I'm not aware of a way to force a join operation to only use the first matching row from the right plan onto the left plan. My current "hack" to implement this is to perform my filters, joins, and sorting followed by the op:reduce() function to iterate through the rows and check if the previous row was for the same parent. If it is the same parent, then I set the "isFirst" flag to fn:false(), otherwise fn:true(). I then pass the op:result() of that reduction into the op:from-literals() to perform the aggregations and counts I need to on the "first" rows.

Here is pseudocode of this "hacky" implementation:

let $reducePlan := $mainPlan
    => op:reduce(function($previous as map:map*, $row as map:map) as map:map* {
            let $prevUuid := map:get($previous[fn:last()], "parentUuid")
            let $currUuid := map:get($row, "parentUuid")
            let $isFirst := 
                if ($prevUuid eq $currUuid)
                then fn:false()
                else fn:true()
            let $_ := map:put($row, "isFirst", $isFirst)
            return (
                $previous, 
                $row
            )
        },
        map:map()
    )
    => op:result()

let $literals := op:from-literals($reducePlan)
    (: Get the "first" row for each parent :)
    => op:where(op:eq(op:col("isFirst"), fn:true()))
    (:
        Other aggregations and counts here...
    :)
    => op:result()
return $literals

This produces correct results but I worry about its performance when there are lots of parents and overall hacky approach. I would normally use the op:group-by() function but I need to perform aggregations on other columns that may not all be the same value. I am not able to change the format or structure of the XML that produces the view.

Two main questions:

  1. Is there a way to limit a join to the first matching row or limit the right plan before the join to only have the first row for a "parent"?
  2. Can I change something about the way I perform the op:reduce() to leave the optic plan in place to be able to run additional modifier functions afterwards?

Solution

  • Anything after you no longer have a valid plan is post-processing and would not scale. op:reduce() piped into op:from-literals() effectively inhales all of the content from indexes, and into memory.

    I would suggest that you take an approach that continues to modify plans. For your use-case, if the effectiveDate is unique per parentUuid, then an approach could be: make a plan that isolates the appropriate timestamp per parentUuid and then use that to filter the results.

    Example below does the following:

    The end result is that the three expected records are shown. enter image description here

    The key here is in the filter-plan. It looks like this:

    enter image description here

    Those 2 columns are enough to then use a filtering join.

    This may not be exactly what you need. However, by creating keys in a certain way and ensuring that there is something unique for the aggregate makes this a pattern I use often with optic