sql-serversql-server-2008heapclustered-indexheap-table

SQL Server heap v.s. clustered index


I am using SQL Server 2008. I know if a table has no clustered index, then it is called heap, or else the storage model is called clustered index (B-Tree).

I want to learn more about what exactly means heap storage, what it looks like and whether it is organized as "heap" data structure (e.g. minimal heap, maximum heap). Any recommended readings? I want to more a bit more internals, but not too deep. :-)

thanks in advance, George


Solution

  • Heap storage has nothing to do with these heaps.

    Heap just means records themselves are not ordered (i. e. not linked to one another).

    When you insert a record, it just gets inserted into the free space the database finds.

    Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

    If you create a secondary index on a HEAP table, the RID (a kind of a physical pointer to the storage space) is used as a row pointer.

    Clustered index means that the records are part of a B-Tree. When you insert a record, the B-Tree needs to be relinked.

    Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

    If you create a secondary index on a clustered table, the value of the clustered index key is used as a row pointer.

    This means a clustered index should be unique. If a clustered index is not unique, a special hidden column called uniquifier is appended to the index key that makes if unique (and larger in size).

    It is also worth noting that creating a secondary index on a column makes the values or the clustered index's key to be the part of the secondayry index's key.

    By creating an index on a clustered table, you in fact always get a composite index

    CREATE UNIQUE CLUSTERED INDEX CX_mytable_1234 (col1, col2, col3, col4)
    
    CREATE INDEX IX_mytable_5678 (col5, col6, col7, col8)
    

    Index IX_mytable_5678 is in fact an index on the following columns:

    col5
    col6
    col7
    col8
    col1
    col2
    col3
    col4
    

    This has one more side effect:

    A DESC condition in a single-column index on a clustered table makes sense in SQL Server

    This index:

    CREATE INDEX IX_mytable ON mytable (col1)
    

    can be used in a query like this:

    SELECT  TOP 100 *
    FROM    mytable
    ORDER BY
           col1, id
    

    , while this one:

    CREATE INDEX IX_mytable ON mytable (col1 DESC)
    

    can be used in a query like this:

    SELECT  TOP 100 *
    FROM    mytable
    ORDER BY
           col1, id DESC