sql-servert-sqlquery-optimization

Why does a WHERE IN query on a uniqueidentifer column result in a "range seek" (greater than/less than)?


I have a T-SQL query similar to this one (a bit more complex, simplifying this here):

SELECT *
FROM ParticipantTable p
WHERE p.OrganizationId = @p__linq__0
  AND p.ParticipantId IN (
    @p__linq__1,
    @p__linq__2,
    @p__linq__3,
    @p__linq__4,
    @p__linq__5,
    @p__linq__6,
    @p__linq__7,
    @p__linq__8,
    @p__linq__9,
    @p__linq__10
)

This results in a query plan similar to this one here (clipped to the relevant part): Picture of a query plan with Constant Scans, Compute Scalar, Concatenation, Compute Scalar, Sort (Top N Sort), Merge Interval and Nested Loops (Inner Join) with a clustered index seek

I was surprised by the Clustered Index Seek performing a greater than/less than comparison rather than a "normal" lookup per input key value:

Seek Keys[1]: Prefix: [snip].OrganizationId = Scalar Operator([@p__linq__0]); Start: [snip].ParticipantId > Scalar Operator([Expr1031]); End: [snip].ParticipantId < Scalar Operator([Expr1032])

Picture showing the Clustered Index Seek (Clustered) operator with the aforementioned seek predicate

Why's that? Isn't seeking a range on uniqueidentifier really inefficient, easily resulting in selecting a very broad range of values? Wouldn't it be much more efficient to loop over the input values and seek one by one? Or, with only ten values, pull the values directly into the seek operator?

I guess this might just be an optimization because the organization in question has a low number of participants, but wouldn't be pulling the values into the seek operator be more sensible anyway?

Also, just for academic interest, is there a way of forcing SQL Server to seek on the values rather than on the range? WITH (FORCESEEK([snip](OrganizationId, ParticipantId)) does not work, it will still seek with greater than/less than. I also added OPTIMIZE FOR UNKNOWN, but didn't really expect it to do anything here (and it didn't).


Solution

  • Wouldn't it be much more efficient to loop over the input values and seek one by one?

    This is what it is actually doing. The > and < shown in the execution plan is a bit misleading and an artefact of the underlying mechanism being generic functionality used for multiple cases.

    It is using the dynamic seek mechanism and passing in a flag of 62 which tells the storage engine to do an equality seek.

    (i.e. all five bits from values 2 to 32 inclusive are set per the preceding link and the table in my answer here)

    You will see this 62 when looking at the properties of the compute scalars between the concatenation and constant scan operators.

    enter image description here

    It is employed here for deduping the parametrised list and then doing a seek for each value that remains (as it would be an error to return the same row more than once if it was in the IN list more than once)

    For more about this execution plan pattern see Dynamic Seeks and Hidden Implicit Conversions