sql-serverquery-optimizationdatabase-tuning-advisor

DB Engine Tuning Advisor suggestion improvement


We have a table which holds all email messages ready to send and which have already been sent. The table contains over 1 million rows.

Below is the query to find the messages which still need to be sent. After 5 errors the message is not attempted anymore and needs to be fixed manually. SentDate remains null until the message is sent.

SELECT TOP (15) 
    ID,
    FromEmailAddress,
    FromEmailDisplayName,
    ReplyToEmailAddress,
    ToEmailAddresses,
    CCEmailAddresses,
    BCCEmailAddresses,
    [Subject],
    Body,
    AttachmentUrl
FROM sysEmailMessage
WHERE ErrorCount < 5 
AND SentDate IS NULL
ORDER BY CreatedDate 

The query is slow, I assumed due to lacking indexes. I've offered the query to the Database Engine Tuning Advisor. It suggests the below index (and some statistics, which I generally ignore):

SET ANSI_PADDING ON

CREATE NONCLUSTERED INDEX [_dta_index_sysEmailMessage_7_1703677117__K14_K1_K12_5_6_7_8_9_10_11_15_17_18] ON [dbo].[sysEmailMessage]
(
    [SentDate] ASC,
    [ID] ASC,
    [ErrorCount] ASC
)
INCLUDE (   [FromEmailAddress],
    [ToEmailAddresses],
    [CCEmailAddresses],
    [BCCEmailAddresses],
    [Subject],
    [Body],
    [AttachmentUrl],
    [CreatedDate],
    [FromEmailDisplayName],
    [ReplyToEmailAddress]) WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

(On a sidenote: this index has a suggested size of 5,850,573 KB (?) which is neary 6 GB and doesn't make any sense to me at all.)

My question is does this suggested index make any sense? Why for example is the ID column included, while it's not needed in the query (as far as I can tell)? As far as my knowledge of indexes goes they are meant to be a fast lookup to find the relevant row. If I had to design the index myself I would come up with something like:

SET ANSI_PADDING ON

CREATE NONCLUSTERED INDEX [index_alternative_a] ON [dbo].[sysEmailMessage]
(
    [SentDate] ASC,
    [ErrorCount] ASC
)
WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

Is the optimizer really clever or is my index more efficient and probably better?


Solution

  • There's 2 different aspects for selecting an index, the fields you need for finding the rows (=actual indexed fields), and the fields that are needed after that (=included fields). If you're always doing top 15 rows, you can totally ignore included fields because 15 keylookups will be fast -- and adding the whole email to the index would make it huge.

    For the indexed fields, it's quite important to know how big percentage of the data matches your criteria.

    Assuming almost all of your rows have ErrorCount < 5, you should not have it in the index -- but if it's a rare case, then it's good to have.

    Assuming SentDate is really rarely NULL, then you should have that as the first column of the index.

    Having CreatedDate in the index depends on how many rows on average the are found from the table with the ErrorCount and SentDate criteria. If it is a lot (thousands) then it might help to have it there so the newest can be found fast.

    But like always, several things affect the performance so you should test how different options affect your environment.