sqlsql-servert-sqlclustered-indexnon-clustered-index

Speeding ORDER BY clause with index


I have a query with an ORDER BY clause that is slow due to the table having over 11 million rows.

I can speed it up dramatically by adding a clustered index on the column in the ORDER BY clause. However, the software creates the query to order by different columns, depending on user settings. And you cannot add more than one clustered index to a table.

My question is: can non-clustered indexes be used to improve ORDER BY performance? Or is there something special about clustered indexes that means I will not be able to sort quickly for all columns?

Note: I've posted my real query and execution plan online but there are other issues that I don't want to go into here. I didn't create the database or write the query. And the query is still very slow even without the IN clause.


Solution

  • Non-clustered indexes can absolutely be used to optimize away a sort. Indexes are essentially binary search trees, which means they contain the values sorted in order.

    However, depending on the query, you may be putting SQL Server into a conundrum.

    If you have a table with 100 million rows, your query will match 11 million of them, like below, is it cheaper to use an index on category to select the rows and sort the results by name, or to read all 100 million rows out of the index pre-sorted by name, and then filter down 89 million of them by checking the category?

    select ...
    from product
    where category = ?
    order by name;
    

    In theory, SQL Server may be able to use an index on name to read rows in order and use the index on category to filter efficiently? I'm skeptical. I have rarely seen SQL Server use multiple indexes to access the same table in the same query (assuming a single table selection, ignoring joins or recursive CTE's). It would have to check the index 100 million times. Indexes have a high overhead cost per index search, so they are efficient when a single search narrows down the result set by a lot.

    Without seeing a schema, statistics, and exact query, it's hard for me to say what makes sense, but I expect I would find SQL Server would use an index for the where clause and sort the results, ignoring an index on the sort column.

    An index on the sort column may be used if you are selecting the entire table though. Like select ... from product order by name;

    Again, your milage may vary. This is speculation based off past experience.