sql-servert-sqlcasecalculated-columnsnewid

Inconsistent results with NEWID() and PERSISTED computed column


I am getting odd results when using NEWID() in combination with a persistent computed column. Am I using some function wrong?

Not using persisted when creating the column, and therefore calculating values when selecting them, will return correct values. Updating the column (col1) will also return correct values.

DECLARE @test TABLE (
    Col1 INT,
    Contains2 AS CASE WHEN 2 IN (Col1) THEN 1 ELSE 0 END PERSISTED)

INSERT INTO @test (Col1) VALUES
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5))

SELECT * FROM @test
UPDATE @test SET Col1 = Col1*1
SELECT * FROM @test

/*
Col1    Contains2
2   0
2   0
0   1
4   0
3   0

Col1    Contains2
2   1
2   1
0   0
4   0
3   0
*/

Solution

  • Apparently, the query engine calculates the random number twice for each row.

    First time for Col1, second time for the CASE statement of the persisted column.

    Optimiser doesn't know, or doesn't care in this case that NEWID is a non-deterministic function and calls it twice.

    Actually, it may even not have a choice. Do you want optimiser to create a temporary table behind the scenes, populate its Col1 with results of the expression that generates random numbers, then read that temporary table back and use these saved intermediary results for calculating the result of the CASE expression, then perform the final INSERT? In this case, it is cheaper for optimiser to calculate the expression twice without writing intermediary results to disk. In some other cases (say, when you have not 5, but 5 billion rows, or extra indexes), the estimated costs may be different and this behaviour would change.

    I don't think you can do much about it. Just be aware of this behaviour. Always explicitly save the generated set of random numbers to a table, then perform further calculations based on them.

    I reproduced it in SQL Server 2008 and 2014. Here is an execution plan that I got in SQL Server 2008, but it is not really interesting. In 2014 the plan is the same, except there is no Top operator.

    plan 2008

    Constant Scan operator outputs a Union1009 list, which is used in Compute Scalar later. I guess, it comes down to implementation details of Constant Scan and/or Compute Scalar operators.

    Observed behaviour tells us that newid() is called twice per row here.