restxquerymarklogiccts-search

Option "wildcarded" for "jsonPropertyValueQuery" ctsquery causes the different results called via REST API or in Marklogic Query Console


We have local instance of Marklogic recently downloaded from docker hub with the following bash command:

docker run --name marklogic-test -d -it -p 8000:8000 -p 8001:8001 -p 8002:8002 \
 -e MARKLOGIC_INIT=true \
 -e MARKLOGIC_ADMIN_USERNAME=admin \
 -e MARKLOGIC_ADMIN_PASSWORD='Areally!PowerfulPassword1337' \
 marklogicdb/marklogic-db:10.0-9.4-centos-1.0.0-ea4

The "Documents" database contains only two documents: sample1.json

{
   "v1": "1234",
   "v2": "ABCD",
   "v3": "0123456789" 
}

and sample2.json

{
   "v1": "5678", 
   "v2": "EFGH", 
   "v3": "9876543210"
}

In case we will run the following XQuery in Query Console:

xquery version "1.0-ml";

let $query := cts:and-query((
  cts:directory-query("/", "infinity"),
  cts:json-property-value-query("v3", "01*", ("wildcarded", "whitespace-sensitive", "punctuation-sensitive"))
))

return (xdmp:to-json($query), cts:search(/,$query))

The result will be expected... It will return only one document:

{
    "andQuery": {
        "queries": [
            {
                "directoryQuery": {
                    "uris": [
                        "/"
                    ],
                    "depth": "infinity"
                }
            },
            {
                "jsonPropertyValueQuery": {
                    "property": [
                        "v3"
                    ],
                    "value": [
                        "01*"
                    ],
                    "options": [
                        "punctuation-sensitive",
                        "whitespace-sensitive",
                        "wildcarded",
                        "lang=en"
                    ]
                }
            }
        ]
    }
}
    json as 
    JSON
    {
    "v1": "1234",
    "v2": "ABCD",
    "v3": "0123456789"
}

But if I will do the REST API request:

curl --location --request POST 'http://localhost:18000/LATEST/search?format=json' --user 'admin:Areally!PowerfulPassword1337' --header 'Content-Type: application/json' --data-binary "@test_api.json"

where the test_api.json has the following content:

{
    "search": {
        "ctsquery": {
            "andQuery": {
                "queries": [
                    {
                        "directoryQuery": {
                            "uris": [
                                "/"
                            ],
                            "depth": "1"
                        }
                    },
                    {
                        "jsonPropertyValueQuery": {
                            "property": [
                                "v3"
                            ],
                            "value": [
                                "01*"
                            ],
                            "options": [
                                "punctuation-sensitive",
                                "wildcarded",
                                "whitespace-sensitive",
                                "lang=en"
                            ]
                        }
                    }
                ]
            }
        },
        "options": {
            "return-plan": false,
            "return-metrics": true,
            "return-facets": true,
            "return-query": false,
            "transform-results": {
                "apply": "raw"
            },
            "page-length": 10
        }
    }
}

The answer is like that:

{
    "snippet-format": "snippet",
    "total": 2,
    "start": 1,
    "page-length": 10,
    "results": [
        {
            "index": 1,
            "uri": "/sample1.json",
            "path": "fn:doc(\"/sample1.json\")",
            "score": 0,
            "confidence": 0,
            "fitness": 0,
            "href": "/v1/documents?uri=%2Fsample1.json",
            "mimetype": "application/json",
            "format": "json",
            "matches": [
                {
                    "path": "fn:doc(\"/sample1.json\")/object-node()",
                    "match-text": [
                        "1234 ABCD 0123456789"
                    ]
                }
            ]
        },
        {
            "index": 2,
            "uri": "/sample2.json",
            "path": "fn:doc(\"/sample2.json\")",
            "score": 0,
            "confidence": 0,
            "fitness": 0,
            "href": "/v1/documents?uri=%2Fsample2.json",
            "mimetype": "application/json",
            "format": "json",
            "matches": [
                {
                    "path": "fn:doc(\"/sample2.json\")/object-node()",
                    "match-text": [
                        "5678 EFGH 9876543210"
                    ]
                }
            ]
        }
    ],
    "metrics": {
        "query-resolution-time": "PT0.000624S",
        "snippet-resolution-time": "PT0.005684S",
        "total-time": "PT0.00713S"
    }
}

For some reason both documents are returned as result! Even though "confidence" is 0. How to understand that behavior of MarkLogic searching engine? Is it a bug of Marklogic REST API or is there something we are missing?


Solution

  • In Query Console, look at the difference when applying the "filtered" vs. "unfiltered" option to your search.

    A filtered search (the default). Filtered searches eliminate any false-positive matches and properly resolve cases where there are multiple candidate matches within the same fragment. Filtered search results fully satisfy the specified cts:query.

    https://docs.marklogic.com/guide/performance/unfiltered#id_89797

    An unfiltered search omits the filtering step, which validates whether each candidate fragment result actually meets the search criteria. Unfiltered searches, therefore, are guaranteed to be fast, while filtered searches are guaranteed to be accurate. By default, searches are filtered; you must specify the "unfiltered" option to cts:search to return an unfiltered search.