sqlsql-serverfiltered-index

Should I create filtered index for column that is mostly null


I have a table with millions of rows and bunch of columns that are mostly empty:

CREATE TABLE MyTable
(
    [Id] int NOT NULL IDENTITY(1,1),
    [MostlyEmptyColumnA] varchar(30) NULL,
    [MostlyEmptyColumnB] varchar(30) NULL,
    [MostlyEmptyColumnC] varchar(30) NULL,
    [MostlyEmptyColumnD] varchar(30) NULL,

    CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED ([Id]),
)

However I often query

SELECT ... 
FROM MyTable 
WHERE MostlyEmptyColumnA = @parameterA

and so on for the other columns.

Is it recommended that I create index for each of those columns?

Does it make sense to create filtered index in order to reduce index size? (assuming @parameterA always has NOT NULL value, so I never have MostlyEmptyColumnA IS NULL condition in where clause)

CREATE INDEX IxMostlyEmptyColumnA
ON MyTable (MostlyEmptyColumnA)
WHERE MostlyEmptyColumnA IS NOT NULL;

Solution

  • Is it recommended that I create index for each of those columns?

    We can't provide any such recommendation for you.

    You need to determine whether the benefits provided by such indexes outweigh the costs of the indexes when considering the overall mix of queries in your workload.

    Does it make sense to create filtered index in order to reduce index size?

    If you have already determined that you need to provide such indexes though (so the choice is just between filtered and unfiltered) then filtered are almost certainly a more appropriate choice in this case than unfiltered.

    As you "never have MostlyEmptyColumnA IS NULL condition in where clause" then storing these index rows for the NULL case will just increase the index size and the cost of maintaining these indexes (e.g. number of writes to index pages and transaction log when creating the index in the first place and for subsequent write operations against the base table).

    The proposed filtered index can be matched against the predicate [MostlyEmptyColumnA] = @parameterA

    enter image description here

    So the filtered index is capable of meeting the stated querying requirement. As usual whether or not the index is actually used will depend on cost based optimisation (in the case that the index does not cover all columns required for the query this will involve costing the estimated lookups vs cost of just scanning a covering index that may be unfiltered).

    SQL Server does not care about the possibility that @parameterA might be NULL as in that case the predicate won't match any rows anyway so this doesn't block it producing an execution plan using the filtered index.

    (If you used [MostlyEmptyColumnA] IS NOT DISTINCT FROM @parameterA this would be a problem however!)

    For write operations against MyTable if SQL Server can detect at compile time that the values being written won't affect the filtered index then it can omit any operators to maintain the filtered index (e.g. as inserting rows with literal NULL or only updating specific columns not included in the index).

    If the plan does require filtered index maintenance then the execution plan might either use a "narrow" plan and just have one execution plan operator responsible for updating all indexes. Or a "wide" plan and have the clustered index updated first and then the rows emitted from that and the filtered index predicate applied before making the relevant writes to the filtered index.

    If there are multiple filtered indexes to update then the wide plan would spool all of the updated rows (even if they match none of the filtered index conditions) so it can replay this spool for all of the indexes. This spooling can certainly make a noticeable difference to execution timings of large updates.

    The above is not specific to filtered indexes though and non filtered ones would have even more potential impact on writes!

    It is up to you to evaluate the costs vs benefits for your workload.