I recently changed an index on a table and saw a huge loss in performance of one query. I would like to understand why this happened.
This is the query. The foreign key just like From
and To
are changing but the rest stays and is repeated often.
SELECT COUNT(*)
FROM Table_With_Values
WHERE FK_ObjectTheValuesBelongTo_Id = 460
AND [From]>=CONVERT([datetime2](3),'07.10.2024 00:00:00',(104))
AND [To]<=CONVERT([datetime2](3),'08.10.2024 00:00:00',(104))
At first the index for the table TableWithValues
looked like this:
CREATE NONCLUSTERED INDEX [Idx_TableWithValues_Fk_ObjectTheValuesBelongTo_Id_From_To] ON [dbo].[TableWithValues]
(
[Fk_ObjectTheValuesBelongTo_Id] ASC,
[From] ASC,
[To] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
Then I read the SQL Server documentation and saw this:
Consider the order of the columns if the index contains multiple columns. The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first. Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.
The From
and To
columns have around 140,000 distinct values while the foreign key column only has 1,600. Therefore I decided to change their order and made the index look like this:
CREATE NONCLUSTERED INDEX [Idx_TableWithValues_From_To_Fk_ObjectTheValuesBelongTo_Id] ON [dbo].[TableWithValues]
(
[From] ASC,
[To] ASC,
[Fk_ObjectTheValuesBelongTo_Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
However, this caused the performance for this query to deteriorate. It took 1,000 times as much time to execute. I know this because I activated the query store on a copy of the database. Then I ran the IT system which sends the queries and checked the value in avg_duration
of the table query_store_runtime_stats
.
The table itself looks like this:
CREATE TABLE [dbo].[TableWithValues](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Fk_ObjectTheValuesBelongTo_Id] [int] NOT NULL,
[Value] [decimal](9, 3) NOT NULL,
[From] [smalldatetime] NOT NULL,
[To] [smalldatetime] NOT NULL,
CONSTRAINT [Pk_TableWithValues_Id] PRIMARY KEY NONCLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY],
CONSTRAINT [Uq_TableWithValues_ObjectTheValuesBelongTo_Id_From] UNIQUE NONCLUSTERED
(
[Fk_ObjectTheValuesBelongTo_Id] ASC,
[From] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TableWithValues] WITH CHECK ADD CONSTRAINT [Fk_TableWithValues_ObjectTheValuesBelongTo_Id] FOREIGN KEY([Fk_ObjectTheValuesBelongTo_Id])
REFERENCES [dbo].[ObjectTheValuesBelongTo] ([Id])
GO
The query plan for runs with both indexes looks nearly the same. There are two differences. They use differente indexes (hidden in the screenshot) and the percentages are 13% and 87% (Fk first in Index) in one case and 20% and 80% (Fk last in index) in the other.
Any idea, why my database behaves not as one would expect when reading the documentation?
You missed the rest of that paragraph, which obliquely explains the real best way to index.
I've added numbering so you can see what it's trying to say:
Consider the order of the columns if the index contains multiple columns. The column that is used in the
WHERE
clause in:
- an equal to (=)
- greater than (>), less than (<), or BETWEEN search condition
- or participates in a join
should be placed first.
- Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.
So in your case, FK_ObjectTheValuesBelongTo_Id
is an equality, so goes first. This is irrespective of how distinct it is. Even if it's a bit
column, if it's an equality it goes first.
In my opinion, the decision about other columns (inequality, join and others) is mostly dependent on whether there are query requirements for grouping and sorting, and what the cardinality of the result after inequality predicates is. If the table is large and the predicate is filtering a lot out then sorting again later won't matter, whereas if it's only removing a small percentage then the later sorting will be slow and needs the index for optimization.
Any other columns not being searched, only selected, should be in the INCLUDE
, not the key. The order does not matter.
And if you have two or more inequalities or join columns, then you can't index for both. You need to decide which is going to filter the best, then put the rest of the columns as INCLUDE
s.
So the best index is either
(
[Fk_ObjectTheValuesBelongTo_Id] ASC,
[From] ASC
) INCLUDE (
[To]
)
or
(
[Fk_ObjectTheValuesBelongTo_Id] ASC,
[To] DESC
) INCLUDE (
[From]
)
Furthermore, the index is not being seeked correctly, because you are passing a datetime2
rather than smalldatetime
. You need to use the correct type.