I have a table Credit:
create table Credit (ID_Credit int identity, ID_PayRequestStatus int, ... 20 more fields)
create nonclustered index Credit_ix_PayRequestStatus ON dbo.Credit(ID_PayRequestStatus)
The table has about 200k rows. The distribution of ID_PayRequestStatus is as follows:
ID_PayRequestStatus | Number of Values |
---|---|
400 | 198000 |
300 | 1000 |
200 | 490 |
100 | 450 |
999 | 250 |
If I run a query like this:
declare @ID_Status int = 200
select * from Credit where ID_PayRequestStatus = @ID_Status
It uses an index scan of the primary key clustered index and SSMS suggests that I create another index on the same column (ID_PayRequestStatus) but INCLUDE the entire table. Sounds like classic parameter sniffing.
If I run the following query to defeat parameter sniffing (note that the parameter is not what I am optimizing for):
declare @ID_Status int = 200
select * from Credit
where ID_PayRequestStatus = @ID_Status
OPTION (OPTIMIZE FOR (@ID_Status=100))
It uses the index seek on Credit_ix_PayRequestStatus index just as I intended.
My main question is why does comparing against a value, that I didn't optimize for, defeat parameter sniffing?
My secondary question is why does SQL Server use parameter sniffing (e.g. index scan) to begin with? It's a single table, the index is clearly defined, there are no joins and there aren't any secondary criteria in the WHERE statement. I suspect, it's because of the lopsided distribution of ID_PayRequestStatus with 400 taking up 99% of the rows and, thus SQL Server deciding that it's cheaper to do a scan. However, the queries against this table with ID_PayRequestStatus=400
constitute less than 1% of the total. Shouldn't the auto statistics have kicked in by now to resolve this performance problem?
When you filter by a local variable, parameter sniffing is suppressed as the value is not available at the time the query plan is generated
So this
declare @ID_Status int = 200
select * from Credit where ID_PayRequestStatus = @ID_Status
is similar to
declare @ID_Status int = 200
select * from Credit
where ID_PayRequestStatus = @ID_Status
OPTION (OPTIMIZE FOR UNKNOWN)
My main question is why does comparing against a value, that I didn't optimize for, defeat parameter sniffing?
OPTION (OPTIMIZE FOR (@ID_Status=100))
This optimizes for @ID_Status=100, just as if parameter sniffing had sniffed that value.
My secondary question is why does SQL Server use parameter sniffing (e.g. index scan) to begin with?
Again the scan is not the result of parameter sniffing. It's optimizing for unknown, and considering the the most common case to be the value 400, for which the clustered index scan is cheaper than a non-clustered index scan + bookmark lookup for the other columns.
Shouldn't the auto statistics have kicked in by now to resolve this performance problem?
Auto stats will ensure SQL Server knows that 400 is the most common column. The fact that you rarely query for that value is not known to SQL Server.