I have a SQL server table with ~10-100s of millions of rows which acts as a fact table in a star schema. An example definition would look like this:
CREATE TABLE fact (
[ImportID] INT NOT NULL,
[Dim1ID] INT NOT NULL,
[Dim2ID] INT NOT NULL,
[Dim3ID] INT NOT NULL,
[Dim4ID] INT NOT NULL,
[dData] FLOAT(53)
CONSTRAINT PK_Data_Dims PRIMARY KEY CLUSTERED ([Dim4ID],[Dim3ID],[Dim2ID],[Dim1ID]),
INDEX IX_Data_Imports NONCLUSTERED ([ImportID])
)
I need this table to be performant when:
SELECT [Dim1ID], [Dim2ID], [Dim3ID], [Dim4ID], [dData]
FROM fact
WHERE [ImportID] = X
ORDER BY [Dim4ID],[Dim3ID],[Dim2ID],[Dim1ID]
I have included the indexes I would expect to want in the example definition but ORDER BY performance would seem to depend on how duplicate keys in the nonclustered index are ordered.
I have also considered using a heap table with non-clustered indices for both. (In practice, there are 13 INT dimension IDs so the key size with clustered PK_Data_Dims will be rather large.) I think this is likely to be slower, however, as the second dimension in the index is correlated with the time the records are inserted. (The first does not have many distinct values so should not fragment the imported records too greatly.) Therefore, I would expect there to be some spatial locality of the records in an import when they are inserted into the clustered index.
I may be greatly overthinking this but I see a few more situations like this coming up for me so I would appreciate some input on best practices or some additional factors that I might want to consider.
A non-clustered index automatically includes the clustering key, even if you don't specify it. It is added in the same order as the clustering key.
You can however change the order explicitly, by adding those columns to your non-clustered index's key.
You can see this in action in this fiddle.
So in your particular case:
WHERE
) and the ORDER BY
requirement is in the same order as the clustering key. Again, if the ORDER BY
changes then you can either reorder the primary/clustering key (with obviously no loss in uniqueness) or you can reorder the columns in the NCI by adding them explicitly in the key.The size of the clustering key is a bit of a concern: it shouldn't be very wide normally. This is mainly because it can increase the depth of the B+tree, but also because the clustering key is used in every NC index also. This is less of a concern in your case, as you appear to need those columns in the NCI anyway.