kqlazure-data-explorer

How good is performance of "has" operator in Kusto when RHS is not exactly one term?


The documentation on has operator talks about how we can benefit from term search. It also mentions that if the term is shorter than three characters, the search falls back into column scan rather than using term index.

Reviewing some of our monitoring queries, I found examples of using of has similar to these:

Message has "Heart Beat"
Message has "ends:"

Re-reading the documentation, I could not find mentioning of how the search goes if the RHS is more than one term or a term plus some non-alphanumerical characters. Tests show that functionally it works as expected, correctly identifying occurrence of a string as if it was a single whole term, but performance-wise, do we still benefit from term index or the search falls back to scanning column value?


Solution

  • "Heart Beat" is indexed has two words, "Heart" & "Beat" (because space is a separator).
    has searches the index for each word separately (actually has is case insensitive, so it looks for any case variation of these words. Consider using has_cs for case sensitive search), intersects the results and then search the data itself (within the potential rows) for the exact match of "Heart Beat".

    "ends:" is indexed has "ends" (because colon in a separator). has searches the index for "end" and then search the data itself (within the potential rows) for the exact match of "ends:".