sql-servercolumnstore

What is the difference between clustered and non-clustered columnstore index with respect to rowstore data page concept


  1. In row based clustered index: the entire table data (all columns) is ordered by clustered index column. Each page holds a particular amount of rows and all the column.

  2. In row based non-clustered index - a data structure is created that holds the index column. Each page of the indexed column holds the row-wise values for that column and each row points back to the (another page) clustered index table key row or heap row (if no clustered index exists) for rest of the data.

I understand the concept of columnstore index - in the sense that there are row groups. Each row group contains one column segment (compressed) for every column in the table and there is a delta store to hold the Inserts/Updates until the next tuple mover process gets invoked. Based on the above 2 points of rowstore index (page), please can you tell me how it works in case of clustered and non-clustered columnstore index.

Example - In case of Columnstore non-clustered index storage, is it conceptually same as rowstore non-clustered index - that is - separate page for the index column, the values of which points to heap or clustered index key.


Solution

  • In comparing rowstore vs columnstore, the term clustered means all columns and non-clustered means some columns (unless one included all columns). There is no other similarly between the disparate architectures of rowstore/columnstore organization. I personally don't even use the word index at all when referring to columnstore since the structure is optimized for scans rather than lookups and often leads to confusion.

    Columnstore index segments, whether clustered or not, are essentially just compressed blobs of data stored in pages/extents. Rowstores, OTOH, have a record structure for each row to accommodate multiple columns of varying types and nullability, which is why they do not compress as well as columnstore data.