I'm trying to understand what I may be doing wrong or not understanding concerning using user-defined functions in Tcl when not using the -deterministic flag; and the "lifetime" of the caching of its results for the same input.
For example, I was using this simple query and changed to the user-defined function; and, forgetting about the importance of the -deterministic flag, took a great deal of time trying to figure out why get_key() is invoked for every row in table doc_pointers when it is always 7.
select *
from
doc_pointers
--where key = order_keys->>:startIdx -- 7
where key = get_key('a',:startIdx); -- 7
I mean "always" only during the "life" of the select
statement, because another query in the CTE of which this is part, may change the order of the keys and :startIdx may no longer return 7. Of course, the same is true of the json_array method; and it doesn't appear to be invoked anew for every row in doc_pointers.
Also, in this segment of a recursive query, the non-recursive portion invokes get_key('a',18)
for every row of table doc_pointers; but, in the recursive portion, get_key('b', p.order_idx + :loopDir)
is not invoked for every row.
I know this because I print to stdout for every invocation of get_key() and can see an 'a' row for every row in doc_pointers and only the small number of 'b' rows expected. I was expecting one 'a' row which is what occurs if the where
clause is r.key = 7
.
I can set get_key('a',18)
to variable such as :startKey since it is used only once, but cannot for get_key('b', p.order_idx + :loopDir)
because it depends on the looping index.
Thus, the questions are, 1) How long will the -deterministic flag cache :startIdx as being 7 indefinitely? and 2) Is it "safe" to use the -deterministic flag in this case? 3) Why does it appear that the two part fo the recursive query treat get_key() differently.
I should have added that I tried experimenting with using the -determinstic flag and it appears that the caching of values does not last beyond the current statement but I'm asking to hear from someone who knows.
Thank you.
pieces( order_idx, key, buffer_id, char_start, char_length, char_begin ) as
(
select
18,
r.key,
r.buffer_id,
r.char_start,
r.char_length,
598
from
doc_pointers r
where
r.key = get_key('a',18)
UNION ALL
select
order_idx + :loopDir,
r.key,
r.buffer_id,
r.char_start,
r.char_length,
p.char_begin +
iif(:loopDir > 0, p.char_length, -r.char_length)
from
doc_pointers r,
pieces p
where
r.key = get_key('b', p.order_idx + :loopDir)
and (
(
:loopDir > 0
and p.char_length + p.char_begin <= cast(:breakPt as integer)
)
or
( :loopDir < 0
and p.char_begin >= cast(:breakPt as integer)
)
)
)
The -deterministic flag caches the result of a function for a specific set of argument values. It does not cache how each argument got its value. Calling the function with a fixed value, a variable, or a more complex expression doesn't make any difference for the caching.
So, you should only use the -deterministic flag on a function that produces the same result for the same inputs every time. This is true if the function produces a result that only depends on the provided arguments. If the function uses additional information that may change, it is not deterministic.
To clarify: You shouldn't worry about how the function will be invoked. Whether that is using fixed or variable arguments is irrelevant. You also don't need to care how long the result is cached. It is just an optimization. If the entry ever gets flushed from the cache, sqlite will simply call the function again the next time it needs the value. If the function is truly deterministic, the newly calculated value will be exactly the same as the value that was lost from the cache.
How long a result is cached may depend on unpredictable factors such as available memory and which other functions are cached between invocations. It is not really feasible to try to take this into account in your function.
Concrete: When $startIdx = 18 and get_key('a', :startIdx)
returns 7, what is cached is: get_key('a', 18) = 7.
If you next change $startIdx to 23 and call get_key('a', :startIdx)
again, the function will be called again because the arguments don't match the cached result.
However, if you would call get_key('a', 18)
, the cached result will be used. Even though you used a fixed variable this time, instead of a variable.