I am getting odd results when using NEWID() in combination with a persistent computed column. Am I using some function wrong?
Not using persisted when creating the column, and therefore calculating values when selecting them, will return correct values. Updating the column (col1) will also return correct values.
DECLARE @test TABLE (
Col1 INT,
Contains2 AS CASE WHEN 2 IN (Col1) THEN 1 ELSE 0 END PERSISTED)
INSERT INTO @test (Col1) VALUES
(ABS(CHECKSUM(NEWID()) % 5)),
(ABS(CHECKSUM(NEWID()) % 5)),
(ABS(CHECKSUM(NEWID()) % 5)),
(ABS(CHECKSUM(NEWID()) % 5)),
(ABS(CHECKSUM(NEWID()) % 5))
SELECT * FROM @test
UPDATE @test SET Col1 = Col1*1
SELECT * FROM @test
/*
Col1 Contains2
2 0
2 0
0 1
4 0
3 0
Col1 Contains2
2 1
2 1
0 0
4 0
3 0
*/
Apparently, the query engine calculates the random number twice for each row.
First time for Col1
, second time for the CASE
statement of the persisted column.
Optimiser doesn't know, or doesn't care in this case that NEWID
is a non-deterministic function and calls it twice.
Actually, it may even not have a choice. Do you want optimiser to create a temporary table behind the scenes, populate its Col1
with results of the expression that generates random numbers, then read that temporary table back and use these saved intermediary results for calculating the result of the CASE
expression, then perform the final INSERT
? In this case, it is cheaper for optimiser to calculate the expression twice without writing intermediary results to disk. In some other cases (say, when you have not 5, but 5 billion rows, or extra indexes), the estimated costs may be different and this behaviour would change.
I don't think you can do much about it. Just be aware of this behaviour. Always explicitly save the generated set of random numbers to a table, then perform further calculations based on them.
I reproduced it in SQL Server 2008 and 2014.
Here is an execution plan that I got in SQL Server 2008, but it is not really interesting. In 2014 the plan is the same, except there is no Top
operator.
Constant Scan
operator outputs a Union1009
list, which is used in Compute Scalar
later. I guess, it comes down to implementation details of Constant Scan
and/or Compute Scalar
operators.
Observed behaviour tells us that newid()
is called twice per row here.