sql-server t-sql query-optimization database-performance

Why is it more efficient to create a new table instead of altering a column?

I have a table with an integer primary key that needs to be changed to a bigint. Thanks to other answers I found multiple solutions for this problem. But when I compared their speed I found the result to be unintuitive.

I expected this to be very fast:

ALTER TABLE tablename DROP CONSTRAINT constraint_name
ALTER TABLE tablename alter column Id bigint not null
ALTER TABLE tablename
ADD CONSTRAINT constraint_name
PRIMARY KEY NONCLUSTERED ([Id] ASC) WITH (
    PAD_INDEX = OFF
    , STATISTICS_NORECOMPUTE = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS = ON
    , ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

However, the below was four times faster than the above.

Select * Into newtable From tablename where 1=0

Alter Table newtable
alter column Id bigint not null

set identity_insert newtable ON

insert into newtable
(Id, all_the_other_column_names)
Select *
From tablename

--rename tables properly

I found explanations that argue that something like the following is going on: MS SQL Server stores data line by line. If you change the size of the first column the data in all other columns needs to be pushed aside to make room for the now bigger bigint column. If you insert into an empty table with the correct data types the copy process is more efficient than the process of pushing other columns aside.

Is this really the cause for the described difference in efficiency?

Feel free to use the below table generator to create a table for testing.

-- Declare variables
DECLARE @RowCount INT = 10; -- Adjust this variable to control the number of rows
DECLARE @Counter INT = 1;

-- Create a temporary table
CREATE TABLE RandomTable (
    ID INT PRIMARY KEY,
    RandomString NVARCHAR(10)
);

-- Loop to insert rows into the table
WHILE @Counter <= @RowCount
BEGIN
    -- Insert random data into the table
    INSERT INTO RandomTable (ID, RandomString)
    VALUES (
        @Counter,
        LEFT(CAST(NEWID() AS NVARCHAR(36)), 10)
    );

    -- Increment counter
    SET @Counter = @Counter + 1;
END

-- Display the contents of the table
SELECT * FROM RandomTable;

The name of the primary key constraint created like this can be found like this:

SELECT name  
FROM sys.key_constraints  
WHERE type = 'PK' 
and name like '%Random%'

Solution

Well, in context of your question there are several things to consider. First of all, your indexes whether they are normal indexes, or unique indexes or primary key indexes, they are organizing a roadmap for future searches. Now, if your fields have a given size and their size increases, then the way those indexes are internally organized may change.

Think of this intuitively. If you have integer values up to 255, then it makes sense to see the 2-based logarithm of 256 and divide your future values into 8 different partitions. Now, if instead of that, with such an organization already in place your maximum limit changes up to 65535, then you will likely need to reorganize your data.

Besides that, data is stored in data blocks and based on the data block size (see https://dba.stackexchange.com/questions/142462/choosing-the-right-storage-block-size-for-sql-server) your values are split up accordingly. Now, if you increase the size of some fields, then their data block sizes may change, which means that their block's updating will not be enough, some new blocks will need to be allocated and logically linked to the record they belong to.

This block reorganizing as well as index reorganizing is time-consuming. It's always simpler to rebuild something anew than to completely reorganize it while being attentive that all the details must be kept consistent.