sql-serverperformanceindexing

Why does the order of columns in the index has such a big impact in this case?


I recently changed an index on a table and saw a huge loss in performance of one query. I would like to understand why this happened.

This is the query. The foreign key just like From and To are changing but the rest stays and is repeated often.

SELECT COUNT(*) 
FROM Table_With_Values 
WHERE FK_ObjectTheValuesBelongTo_Id = 460 
AND [From]>=CONVERT([datetime2](3),'07.10.2024 00:00:00',(104)) 
AND [To]<=CONVERT([datetime2](3),'08.10.2024 00:00:00',(104))

At first the index for the table TableWithValues looked like this:

CREATE NONCLUSTERED INDEX [Idx_TableWithValues_Fk_ObjectTheValuesBelongTo_Id_From_To] ON [dbo].[TableWithValues]
(
    [Fk_ObjectTheValuesBelongTo_Id] ASC,
    [From] ASC,
    [To] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

Then I read the SQL Server documentation and saw this:

Consider the order of the columns if the index contains multiple columns. The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first. Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.

The From and To columns have around 140,000 distinct values while the foreign key column only has 1,600. Therefore I decided to change their order and made the index look like this:

CREATE NONCLUSTERED INDEX [Idx_TableWithValues_From_To_Fk_ObjectTheValuesBelongTo_Id] ON [dbo].[TableWithValues]
(
    [From] ASC,
    [To] ASC,
    [Fk_ObjectTheValuesBelongTo_Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

However, this caused the performance for this query to deteriorate. It took 1,000 times as much time to execute. I know this because I activated the query store on a copy of the database. Then I ran the IT system which sends the queries and checked the value in avg_duration of the table query_store_runtime_stats.

The table itself looks like this:

CREATE TABLE [dbo].[TableWithValues](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Fk_ObjectTheValuesBelongTo_Id] [int] NOT NULL,
    [Value] [decimal](9, 3) NOT NULL,
    [From] [smalldatetime] NOT NULL,
    [To] [smalldatetime] NOT NULL,
 CONSTRAINT [Pk_TableWithValues_Id] PRIMARY KEY NONCLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY],
 CONSTRAINT [Uq_TableWithValues_ObjectTheValuesBelongTo_Id_From] UNIQUE NONCLUSTERED 
(
    [Fk_ObjectTheValuesBelongTo_Id] ASC,
    [From] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[TableWithValues]  WITH CHECK ADD  CONSTRAINT [Fk_TableWithValues_ObjectTheValuesBelongTo_Id] FOREIGN KEY([Fk_ObjectTheValuesBelongTo_Id])
REFERENCES [dbo].[ObjectTheValuesBelongTo] ([Id])
GO

The query plan for runs with both indexes looks nearly the same. There are two differences. They use differente indexes (hidden in the screenshot) and the percentages are 13% and 87% (Fk first in Index) in one case and 20% and 80% (Fk last in index) in the other. enter image description here

Any idea, why my database behaves not as one would expect when reading the documentation?


Solution

  • You missed the rest of that paragraph, which obliquely explains the real best way to index.

    I've added numbering so you can see what it's trying to say:

    Consider the order of the columns if the index contains multiple columns. The column that is used in the WHERE clause in:

    1. an equal to (=)
    2. greater than (>), less than (<), or BETWEEN search condition
    3. or participates in a join

    should be placed first.

    1. Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.

    So in your case, FK_ObjectTheValuesBelongTo_Id is an equality, so goes first. This is irrespective of how distinct it is. Even if it's a bit column, if it's an equality it goes first.

    In my opinion, the decision about other columns (inequality, join and others) is mostly dependent on whether there are query requirements for grouping and sorting, and what the cardinality of the result after inequality predicates is. If the table is large and the predicate is filtering a lot out then sorting again later won't matter, whereas if it's only removing a small percentage then the later sorting will be slow and needs the index for optimization.

    Any other columns not being searched, only selected, should be in the INCLUDE, not the key. The order does not matter.

    And if you have two or more inequalities or join columns, then you can't index for both. You need to decide which is going to filter the best, then put the rest of the columns as INCLUDEs.

    So the best index is either

    (
        [Fk_ObjectTheValuesBelongTo_Id] ASC,
        [From] ASC
    ) INCLUDE (
        [To]
    )
    

    or

    (
        [Fk_ObjectTheValuesBelongTo_Id] ASC,
        [To] DESC
    ) INCLUDE (
        [From]
    )
    

    Furthermore, the index is not being seeked correctly, because you are passing a datetime2 rather than smalldatetime. You need to use the correct type.