I can see the difference in time taken by CTS Range vs SPARQL Query
.
CTS Range Query - took 0.8ms to get the result, required field indexes are created to make filed query work.
cts:field-values("productid", (), (), cts:and-query(
(
cts:field-value-query("countryCode", "us", ("unstemmed","case-insensitive", "whitespace-insensitive", "punctuation-insensitive", "diacritic-insensitive")),
cts:field-value-query("status", "published", ("unstemmed","case-insensitive", "whitespace-insensitive", "punctuation-insensitive", "diacritic-insensitive"))
)
))
SPARQL Query - took 18ms to get the result, TDE is created to make SPARQL query work.
## query
SELECT ?productid
FROM <product>
WHERE {
?productid <status> <Published>;
<countryCode> <US>.
}
TDE for product-
<?xml version="1.0" encoding="UTF-8"?>
<template xmlns="http://marklogic.com/xdmp/tde">
<context>product</context>
<enabled>true</enabled>
<collections>
<collection>product</collection>
</collections>
<triples>
<triple>
<subject>
<val>sem:iri(productid)</val>
<invalid-values>ignore</invalid-values>
</subject>
<predicate>
<val>sem:iri(xs:string("languageCode"))</val>
<invalid-values>ignore</invalid-values>
</predicate>
<object>
<val>sem:iri(languageCode)</val>
<invalid-values>ignore</invalid-values>
</object>
</triple>
<triple>
<subject>
<val>sem:iri(productid)</val>
<invalid-values>ignore</invalid-values>
</subject>
<predicate>
<val>sem:iri(xs:string("countryCode"))</val>
<invalid-values>ignore</invalid-values>
</predicate>
<object>
<val>sem:iri(fn:normalize-space(xs:string(countryCode)))</val>
<invalid-values>ignore</invalid-values>
</object>
</triple>
<triple>
<subject>
<val>sem:iri(productid)</val>
<invalid-values>ignore</invalid-values>
</subject>
<predicate>
<val>sem:iri(xs:string("status"))</val>
<invalid-values>ignore</invalid-values>
</predicate>
<object>
<val>sem:iri(fn:normalize-space(xs:string(status)))</val>
<invalid-values>ignore</invalid-values>
</object>
</triple>
<triple>
<subject>
<val>sem:iri(productid)</val>
<invalid-values>ignore</invalid-values>
</subject>
<predicate>
<val>sem:iri(xs:string("created"))</val>
<invalid-values>ignore</invalid-values>
</predicate>
<object>
<val>sem:iri(audit/created)</val>
<invalid-values>ignore</invalid-values>
</object>
</triple>
<triple>
<subject>
<val>sem:iri(productid)</val>
<invalid-values>ignore</invalid-values>
</subject>
<predicate>
<val>sem:iri(xs:string("createdBy"))</val>
<invalid-values>ignore</invalid-values>
</predicate>
<object>
<val>sem:iri(audit/createdBy)</val>
<invalid-values>ignore</invalid-values>
</object>
</triple>
<triple>
<subject>
<val>sem:iri(productid)</val>
<invalid-values>ignore</invalid-values>
</subject>
<predicate>
<val>sem:iri(xs:string("updated"))</val>
<invalid-values>ignore</invalid-values>
</predicate>
<object>
<val>sem:iri(audit/updated)</val>
<invalid-values>ignore</invalid-values>
</object>
</triple>
<triple>
<subject>
<val>sem:iri(productid)</val>
<invalid-values>ignore</invalid-values>
</subject>
<predicate>
<val>sem:iri(xs:string("updatedBy"))</val>
<invalid-values>ignore</invalid-values>
</predicate>
<object>
<val>sem:iri(audit/updatedBy)</val>
<invalid-values>ignore</invalid-values>
</object>
</triple>
</triples>
</template>
Please help me to undestand, why there is speed/perofmance difference between these two types of queries ?
Any help is appreciated.
There are many factors related to this. Including infrastructure and tuning of various indexes and caches. I will not attempt to qualify the difference in speed directly, but instead help You understand the major differences in the two approaches You show.
Under the hood, the two approaches are different implementations.
Range Index based query: In that example, you are using pre-defined range indexes. These are memory mapped. Each value also includes a pointer to the the document fragments for which the value (and the fragment ID is an integer-based lexicon). This first query limits the fragments in scope via your two range queries and then returns the values from the range index(already a unique lexicon as well).
In this case, One can think of it as an in-memory intersection of the fragment IDs of (CountryCode=US ∩ Status=Published). Then an intersection of those ids to the in-memory index of productId
All in memory, no deduplication needed. At a cost of rigid, pre-configured indexes and dedicated memory.
SPARQL Query: In this case, you are now traversing a graph of data. The query resolution is completely different, there may be deduplication happening depending on your data and the caching mechanism and memory needs are different.
Range Indexes have no moving parts. However, SPARQL queries have more items that can be tuned.
Various settings are explained here: https://docs.marklogic.com/guide/semantics/indexes
Also, if you are testing this in the SPARQL tab in Query console, then you are relying on choices being made for you related to options. Optimizer and other options could be looked at here: https://docs.marklogic.com/sem:sparql