sqlsql-serverindexing

SQL Server Index strategy for comparing date columns in a table


I have a SQL Server table with a recurring query that I need to improve the performance of it. I need to retrieve rows matching the WHERE clause shown here. All 3 columns are defined as datetime2 columns.

WHERE GREATEST(dateColumn1, dateColumn2) > dateColumn3

What is the best way to index the table to improve performance?

Should I create a computed column and index that?


Solution

  • What is the best way to index the table to improve performance?

    Make your WHERE predicate clause expression into a computed-column and then index that column.


    Last time I checked, SQL Server (incl. Azure SQL) does not support using arbitrary expressions in a CREATE INDEX statement, so any expressions you want to index must be defined as computed (but not necessarily PERSISTED) columns.

    ...I was doing this myself a few months ago, but I ran into a 😩-inducing bug: you actually need to use PERSISTED if you use LEAST or GREATEST in a computed-column expression in SQL Server 2022 or Azure SQL.


    Now, because you'll need to define a computed-column that'll be indexed as-is then consider that you could either...

    1. Define the computed-column simply as the result of GREATEST( col1, col2, etc ) - which means you'll still need the > comparison in your SELECT query, and potentially experience poor query-plan selection if even the slightest change happens to your DB.

    2. ...or: define another computed-column using the result of the > expression, and INDEX over that; you'll need to update your SELECT queries to directly reference the new computed-columns, but this significantly reduces the odds of SQL Server generating a bad query-plan (but YMMV, ofc).

    (I assume your dateCol1 and dateCol2 columns are both datetime2(7) NOT NULL. (If they actually are NULL-able then this answer probably won't work for you because the computed-column and index here won't handle dateCol1 IS NOT NULL or dateCol1 IS NOT DISTINCT FROM dateCol2-style predicates)

    Anyway, like so:

    -- Before doing anything, ensure we're in a deterministic and standards-compliant environment, which is important for PERSISTED columns:
    
    SET ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON
    
    SET XACT_ABORT ON; /* See https://stackoverflow.com/questions/1150032/what-is-the-benefit-of-using-set-xact-abort-on-in-a-stored-procedure */
    
    BEGIN TRANSACTION ddlTxn;
    
    RAISERROR( 'Now running ALTER TABLE...', 0, 1) WITH NOWAIT;
    
    ALTER TABLE dbo.MyTable
        ADD
            Greatest_1_or_2        AS GREATEST( dateCol1, dateCol2 ) PERSISTED NOT NULL,
            DateCol_1_or_2_is_gt_3 AS ( CASE WHEN GREATEST( dateCol1, dateCol2 ) > dateCol3 THEN CONVERT(bit,1) ELSE CONVERT(bit,0) END ) PERSISTED NOT NULL,
            DateCol_1_or_2_diff_3  AS ( DATEDIFF( microsecond, GREATEST( dateCol1, dateCol2 ) - dateCol3 ) PERSISTED NOT NULL;
    
    GO /* This `GO` is to stop SSMS complaining that the computed cols ( defined above, but used below) won't exist at parse-time. */
    
    RAISERROR( 'Now running CREATE INDEX...', 0, 1) WITH NOWAIT;
    
    CREATE INDEX IX_Greatest12 ON dbo.MyTable ( Greatest_1_or_2 ) INCLUDE ( dateCol3 );
    
    CREATE INDEX IX_Greatest12IsGT3 ON dbo.MyTable ( DateCol_1_or_2_is_gt_3 );
    
    CREATE INDEX IX_DateCol12Diff3 ON dbo.MyTable ( DateCol_1_or_2_diff_3 );
    
    RAISERROR( 'COMMITing TRANSACTION...', 0, 1) WITH NOWAIT;
    
    COMMIT TRANSACTION ddlTxn;
    
    RAISERROR( 'Ding!', 0, 1) WITH NOWAIT;
    

    Notice that I defined DateCol_1_or_2_is_gt_3 and DateCol_1_or_2_diff_3 - and associated IX_ indexes - so you'll likely want to choose one bunch or the other (or both?) based on your actual queries.

    So your queries might now look like this:

    SELECT /* ... */ FROM dbo.MyTable AS t WHERE t.Greatest_1_or_2 > t.dateCol3
    

    or:

    SELECT /* ... */ FROM dbo.MyTable AS t WHERE t.DateCol_1_or_2_is_gt_3 = 1;
    

    or:

    SELECT /* ... */ FROM dbo.MyTable AS t WHERE t.DateCol_1_or_2_diff_3 < 0;
    

    You'll need to test-out and benchmark these 3 different variations for yourself because whichever is faster or preferable overall depends entirely on how the rest of your DB works, and from my vantage-point here, on the Internet, I can't see anything :)