sql-serverblobsql-server-2016lobheap-table

SQL Server - Deleting/Updating LOB data in a Heap


I have a SQL Server 2016 database with RCSI enabled that is literally a heap of heaps. With the exception of one table, every other table in the database is a heap and the largest heap is ~200GB which makes up over 50% of the total size of the database.

This particular large heap has two lob columns, both with the varbinary(max) data type. The heap also has a number of non-clustered indexes, thankfully the varbinary(max) columns are not present in any of these non-clustered indexes and thus they are relatively small in size.

The vendor has supplied a clean up script which runs from an application server and purges data from this large heap. After some investigation, I've found that this clean up script does not delete entire rows but instead sets one of the varbinary(max) columns to null based on certain criteria.

Here are some details regarding the heap:

enter image description here

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(N'<database>'), OBJECT_ID(N'GrimHeaper>'),0, null, 'DETAILED');

enter image description here

enter image description here

SELECT * FROM sys.dm_db_index_operational_stats(db_id('<database>'),object_id('GrimHeaper'),0,null);

enter image description here

enter image description here

enter image description here

My understanding in this case is that the space freed by setting the value in the lob column to null will not be automatically re-claimed, this is the behaviour regardless of whether the table is a heap or clustered, please correct me if I am wrong.

In this Microsoft article and also this article it states the below with regards to the index reorganise operation:

REORGANIZE ALL performs LOB_COMPACTION on all indexes. For each index, this compacts all LOB columns in the clustered index, underlying table, or included columns in a nonclustered index.

When ALL is specified, all indexes associated with the specified table or view are reorganized and all LOB columns associated with the clustered index, underlying table, or nonclustered index with included columns are compacted.

I find these statements ambiguous and not very clear. Can anyone confirm that if I ran the “ALTER INDEX ALL ON REORGANISE WITH ( LOB_CAMPACTION = ON )” statement that it would compact the varbinary(max) LOB column(s) even though they are not present in any of the non-clustered indexes and only in the underlying heap? The rationale behind this would be to reclaim any space freed by the application job which sets the LOB column to null for qualifying rows.

Additionally, you can also see that this heap has a number of forwarded records. I also suspect that entire rows have been deleted from the heap, but have not been de-allocated due to the known behaviour of deletes against heaps where rows are only de-allocated when a table lock is taken either explicitly through a table lock query hint or via lock escalation. Considering this, I am thinking about disabling all the non-clustered indexes on the heap, rebuilding the heap and then re-enabling the non-clustered indexes. Would this operation also re-claim/compact any unused space in the lob column as well as removing the forwarded records and deleted but not fully de-allocated rows?

Disclaimer - this database is designed by a vendor, creating clustered indexes isn't acceptable. The application that uses this database isn't used at weekends and thus I have large maintenance windows so while re-building the heap may be resource intensive and painful, it is feasible.


Solution

  • Can anyone confirm that if I ran the “ALTER INDEX ALL ON REORGANISE WITH ( LOB_CAMPACTION = ON )” statement that it would compact the varbinary(max) LOB column(s) even though they are not present in any of the non-clustered indexes and only in the underlying heap?

    Yes. You can easily confirm this empirically, and we'll do so in a minute.

    The rationale behind this would be to reclaim any space freed by the application job which sets the LOB column to null for qualifying rows.

    LOB compaction does not literally reclaim all space freed. Even rebuilding the whole table will not reclaim LOB space -- reorganizing is the best you can do, and that does not reclaim everything. If it makes you feel better: this is not restricted to heap tables, and it's actually a feature, not a bug.

    Let me prove it. Let's create a heap table with LOB data:

    CREATE TABLE heap_of_trouble(ID INT IDENTITY, lobby VARBINARY(MAX));
    
    -- SQL Server will store values <8K in the row by default; force the use of LOB pages
    EXEC sp_tableoption 'heap_of_trouble', 'large value types out of row', 1;
    
    SET NOCOUNT ON;
    GO
    BEGIN TRANSACTION;
    GO
    INSERT heap_of_trouble(lobby) VALUES (CONVERT(VARBINARY(MAX), REPLICATE(' ', 4000)));
    GO 10000
    COMMIT;
    
    SELECT p.[rows], p.index_id, au.[type_desc], au.data_pages, au.total_pages, au.used_pages
    FROM sys.partitions p 
    JOIN sys.allocation_units au ON au.container_id = p.hobt_id
    JOIN sys.objects o ON o.[object_id] = p.[object_id]
    WHERE o.[name] = 'heap_of_trouble'
    
    +-------+----------+-------------+------------+-------------+------------+
    | rows  | index_id |  type_desc  | data_pages | total_pages | used_pages |
    +-------+----------+-------------+------------+-------------+------------+
    | 10000 |        0 | IN_ROW_DATA |         43 |          49 |         44 |
    | 10000 |        0 | LOB_DATA    |          0 |        5121 |       5118 |
    +-------+----------+-------------+------------+-------------+------------+
    

    Let's clear out some columns:

    UPDATE heap_of_trouble SET lobby = NULL WHERE ID % 2 = 0;
    

    And let's get the page count again:

    +-------+----------+-------------+------------+-------------+------------+
    | rows  | index_id |  type_desc  | data_pages | total_pages | used_pages |
    +-------+----------+-------------+------------+-------------+------------+
    | 10000 |        0 | IN_ROW_DATA |         43 |          49 |         44 |
    | 10000 |        0 | LOB_DATA    |          0 |        5121 |       5117 |
    +-------+----------+-------------+------------+-------------+------------+
    

    No change, except for one page at the end. That's expected. So now let's reorganize and compact:

    ALTER INDEX ALL ON heap_of_trouble REORGANIZE WITH (LOB_COMPACTION = ON);
    
    +-------+----------+-------------+------------+-------------+------------+
    | rows  | index_id |  type_desc  | data_pages | total_pages | used_pages |
    +-------+----------+-------------+------------+-------------+------------+
    | 10000 |        0 | IN_ROW_DATA |         43 |          49 |         44 |
    | 10000 |        0 | LOB_DATA    |          0 |        3897 |       3897 |
    +-------+----------+-------------+------------+-------------+------------+
    

    You'll notice the number of pages is not half of what we started with: the LOB data has been reorganized, but not fully rebuilt.

    If you try ALTER TABLE .. REBUILD instead, you will notice that the LOB data is not compacted at all:

    +-------+----------+-------------+------------+-------------+------------+
    | rows  | index_id |  type_desc  | data_pages | total_pages | used_pages |
    +-------+----------+-------------+------------+-------------+------------+
    | 10000 |        0 | IN_ROW_DATA |         29 |          33 |         30 |
    | 10000 |        0 | LOB_DATA    |          0 |        5121 |       5117 |
    +-------+----------+-------------+------------+-------------+------------+
    

    Note how the IN_ROW_DATA has been rebuilt, but the LOB data has been left completely untouched. You can try this with a clustered index as well (simply make the ID a PRIMARY KEY to implicitly create one). However, this is not true for non-clustered indexes. Start over, but this time add another index:

    CREATE INDEX IX_heap_of_trouble_ID ON heap_of_trouble (ID) INCLUDE (lobby)
    

    Including LOB data in an index is not a normal setup, of course; this is just for illustration. And look what we get after ALTER TABLE REBUILD:

    +-------+----------+-------------+------------+-------------+------------+
    | rows  | index_id |  type_desc  | data_pages | total_pages | used_pages |
    +-------+----------+-------------+------------+-------------+------------+
    | 10000 |        0 | IN_ROW_DATA |         29 |          33 |         30 |
    | 10000 |        0 | LOB_DATA    |          0 |        5121 |       5117 |
    | 10000 |        2 | IN_ROW_DATA |         35 |          49 |         37 |
    | 10000 |        2 | LOB_DATA    |          0 |        2561 |       2560 |
    +-------+----------+-------------+------------+-------------+------------+
    

    Surprise (maybe), the LOB data of the non-clustered index is rebuilt, not merely reorganized. ALTER INDEX ALL .. REBUILD will have the same effect, but will leave the heap completely untouched. To sum up with a little table:

    +----------------------+---------------+-------------------+----------------------+
    |                      | TABLE REBUILD | INDEX ALL REBUILD | INDEX ALL REORGANIZE |
    +----------------------+---------------+-------------------+----------------------+
    | Heap in-row          | Rebuild       | -                 | -                    |
    | Heap LOB             | -             | -                 | Reorganize           |
    | Clustered in-row     | Rebuild       | Rebuild           | Reorganize           |
    | Clustered LOB        | -             | -                 | Reorganize           |
    | Non-clustered in-row | Rebuild       | Rebuild           | Reorganize           |
    | Non-clustered LOB    | Rebuild       | Rebuild           | Reorganize           |
    +----------------------+---------------+-------------------+----------------------+
    

    I am thinking about disabling all the non-clustered indexes on the heap, rebuilding the heap and then re-enabling the non-clustered indexes.

    You do not need to separately re-enable non-clustered indexes; ALTER TABLE .. REBUILD rebuilds all indexes as well, and disabled indexes will be re-enabled as part of the rebuild.

    Would this operation also re-claim/compact any unused space in the lob column as well as removing the forwarded records and deleted but not fully de-allocated rows?

    Per our earlier results, no, not exactly. If you're satisfied with merely having the LOB data compacted with the rest of the table rebuilt, the procedure for that would be:

    1. Perform ALTER INDEX ALL .. DISABLE to disable all non-clustered indexes;
    2. Perform ALTER INDEX ALL .. REORGANIZE WITH (LOB_COMPACTION = ON) to compact LOB pages of the underlying heap (this will leave the disabled indexes alone);
    3. Perform ALTER TABLE .. REBUILD to rebuild the in-row data of the heap, as well as all data of the indexes, and re-enable them.

    If you really want to shrink the heap down to its minimum size, you'll have to create a new table and insert the data there, but that involves a lot more scripting and judicious use of sp_rename. It's also very expensive, since it requires copying all the LOB data (something which REORGANIZE avoids). If you do this without paying attention to filegroups and log space used, you can end up consuming more space than you seek to reclaim, and it's unlikely to help with performance.