sql-serversql-server-2022newid

Generate and repeat NEWID() values in a single query


I am trying to generate and then duplicate multiple UNIQUEIDENTIFIER values using NEWID() for use as test data. The following is my attempt is to first generate N1 values and then use a CROSS APPLY to duplicate these values N2 times each:

SELECT *
FROM (
    SELECT
        S1.value,
        NEWID() AS Id
    FROM GENERATE_SERIES(1, 3) S1
) S1
CROSS APPLY (
    SELECT
        S2.value,
        S1.Id -- Trying to repeat the same ID value multiple times
    FROM GENERATE_SERIES(1, 2) S2
) S2
ORDER BY S1.Value, S2.Value

Desired results:

value Id value Id
1 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb 1 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb
1 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb 2 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb
1 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb 3 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb
2 4743c229-0c3c-44fd-b9f8-9406b06e350e 1 4743c229-0c3c-44fd-b9f8-9406b06e350e
2 4743c229-0c3c-44fd-b9f8-9406b06e350e 2 4743c229-0c3c-44fd-b9f8-9406b06e350e
2 4743c229-0c3c-44fd-b9f8-9406b06e350e 3 4743c229-0c3c-44fd-b9f8-9406b06e350e

Actual results:

value Id value Id
1 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb 1 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb
1 8dbf1b38-f6af-4b7e-9b3d-65f3df4bb017 2 8dbf1b38-f6af-4b7e-9b3d-65f3df4bb017
1 2eddb5e1-4f3e-4938-ab7b-49740a20a779 3 2eddb5e1-4f3e-4938-ab7b-49740a20a779
2 4743c229-0c3c-44fd-b9f8-9406b06e350e 1 4743c229-0c3c-44fd-b9f8-9406b06e350e
2 f8a66f07-04da-44ab-af7b-47a747dc6bf6 2 f8a66f07-04da-44ab-af7b-47a747dc6bf6
2 034876b6-8cf5-4264-810d-de5202554d77 3 034876b6-8cf5-4264-810d-de5202554d77

I understand that NEWID() will generate unique values for each row where it is used, but I would have thought that when used in a subquery, the values would be generated in that subquery instead of being generated at the outermost level. I can't find a clear explanation if this behavior in the NEWID documentation or elsewhere.

I tried inserting intermediate calculations using CAST(), TOUPPER(), and REPLACE(), which I thought might capture and solidify the NEWID() values. I also tries wrapping the first part up in a CTE. I still cannot get the generated values to repeat.

I can do this in multiple steps using a #temp table or @table variable, but I was just wondering if there is a way to accomplish this in a single statement.

db<>fiddle

Here's another similar case:

-- Another similar case
WITH CTE AS (
    SELECT NEWID() AS Id -- Is this one value or many?
)
SELECT *
FROM ( VALUES (1), (2), (3) ) ID(Id)
CROSS JOIN CTE

Follow-up:

I discovered a similar question on dba stackexchange that pointed me to a 15 year old Microsoft Connect bug report and discussion, that ended up with the bug report being closed as won't-fix, functions-as-designed. The rationale is that, "the optimizer does not guarantee timing or number of executions of scalar functions", and "it's quite hard to pin down precisely ... what once-per-row behavior means."

That same post had a late comment that found a workaround using a combination of GROUP BY and FULL OUTER JOIN to coax the desired behavior, but later comments suggest that this was still undefined behavior.

I also found some 2005 old books-online documentation that includes a best practice: "Limit uses of nondeterministic functions to outermost select lists." I could not find the same in the current documentation, but may not have been looking in the right place.

So although ValNik's answer below suggests a workaround involving GROUP BY, I think that is also think that is built on unstable ground. (ValNik's answer actually begins with a "does not make sense" comment.)

I also discovered that adding an IS NOT NULL check also coaxes the desired result (fiddle). Again this is likely undefined behavior that just seems to work - best avoided.

Bottom line, it is probably best to follow the old "Limit uses of nondeterministic functions to outermost select lists" practice, which means selecting into a temp table or table variable as Thom A recommended and nbk documented as an answer.


Solution

  • You cannot control how/when SQL server evaluates sub expressions. It could evaluate your original query as follows without breaking any rules:

    SELECT S1.value, NEWID() AS Id, S2.value
    FROM GENERATE_SERIES(1, 3) S1
    CROSS APPLY GENERATE_SERIES(1, 2) S2
    ORDER BY S1.value, S2.value
    

    The NEWID function gets called 3 x 2 times to get 6 different values.

    Using temporary tables to store the newid values is the correct thing to do.