I am trying to generate and then duplicate multiple UNIQUEIDENTIFIER
values using NEWID()
for use as test data. The following is my attempt is to first generate N1 values and then use a CROSS APPLY
to duplicate these values N2 times each:
SELECT *
FROM (
SELECT
S1.value,
NEWID() AS Id
FROM GENERATE_SERIES(1, 3) S1
) S1
CROSS APPLY (
SELECT
S2.value,
S1.Id -- Trying to repeat the same ID value multiple times
FROM GENERATE_SERIES(1, 2) S2
) S2
ORDER BY S1.Value, S2.Value
Desired results:
value | Id | value | Id |
---|---|---|---|
1 | 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb | 1 | 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb |
1 | 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb | 2 | 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb |
1 | 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb | 3 | 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb |
2 | 4743c229-0c3c-44fd-b9f8-9406b06e350e | 1 | 4743c229-0c3c-44fd-b9f8-9406b06e350e |
2 | 4743c229-0c3c-44fd-b9f8-9406b06e350e | 2 | 4743c229-0c3c-44fd-b9f8-9406b06e350e |
2 | 4743c229-0c3c-44fd-b9f8-9406b06e350e | 3 | 4743c229-0c3c-44fd-b9f8-9406b06e350e |
Actual results:
value | Id | value | Id |
---|---|---|---|
1 | 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb | 1 | 26d1f8c0-05e8-4d6f-91ab-f97ce13f79fb |
1 | 8dbf1b38-f6af-4b7e-9b3d-65f3df4bb017 | 2 | 8dbf1b38-f6af-4b7e-9b3d-65f3df4bb017 |
1 | 2eddb5e1-4f3e-4938-ab7b-49740a20a779 | 3 | 2eddb5e1-4f3e-4938-ab7b-49740a20a779 |
2 | 4743c229-0c3c-44fd-b9f8-9406b06e350e | 1 | 4743c229-0c3c-44fd-b9f8-9406b06e350e |
2 | f8a66f07-04da-44ab-af7b-47a747dc6bf6 | 2 | f8a66f07-04da-44ab-af7b-47a747dc6bf6 |
2 | 034876b6-8cf5-4264-810d-de5202554d77 | 3 | 034876b6-8cf5-4264-810d-de5202554d77 |
I understand that NEWID()
will generate unique values for each row where it is used, but I would have thought that when used in a subquery, the values would be generated in that subquery instead of being generated at the outermost level. I can't find a clear explanation if this behavior in the NEWID documentation or elsewhere.
I tried inserting intermediate calculations using CAST()
, TOUPPER()
, and REPLACE()
, which I thought might capture and solidify the NEWID()
values. I also tries wrapping the first part up in a CTE. I still cannot get the generated values to repeat.
I can do this in multiple steps using a #temp table or @table variable, but I was just wondering if there is a way to accomplish this in a single statement.
Here's another similar case:
-- Another similar case
WITH CTE AS (
SELECT NEWID() AS Id -- Is this one value or many?
)
SELECT *
FROM ( VALUES (1), (2), (3) ) ID(Id)
CROSS JOIN CTE
Follow-up:
I discovered a similar question on dba stackexchange that pointed me to a 15 year old Microsoft Connect bug report and discussion, that ended up with the bug report being closed as won't-fix, functions-as-designed. The rationale is that, "the optimizer does not guarantee timing or number of executions of scalar functions", and "it's quite hard to pin down precisely ... what once-per-row behavior means."
That same post had a late comment that found a workaround using a combination of GROUP BY
and FULL OUTER JOIN
to coax the desired behavior, but later comments suggest that this was still undefined behavior.
I also found some 2005 old books-online documentation that includes a best practice: "Limit uses of nondeterministic functions to outermost select lists." I could not find the same in the current documentation, but may not have been looking in the right place.
So although ValNik's answer below suggests a workaround involving GROUP BY
, I think that is also think that is built on unstable ground. (ValNik's answer actually begins with a "does not make sense" comment.)
I also discovered that adding an IS NOT NULL
check also coaxes the desired result (fiddle). Again this is likely undefined behavior that just seems to work - best avoided.
Bottom line, it is probably best to follow the old "Limit uses of nondeterministic functions to outermost select lists" practice, which means selecting into a temp table or table variable as Thom A recommended and nbk documented as an answer.
You cannot control how/when SQL server evaluates sub expressions. It could evaluate your original query as follows without breaking any rules:
SELECT S1.value, NEWID() AS Id, S2.value
FROM GENERATE_SERIES(1, 3) S1
CROSS APPLY GENERATE_SERIES(1, 2) S2
ORDER BY S1.value, S2.value
The NEWID
function gets called 3 x 2 times to get 6 different values.
Using temporary tables to store the newid values is the correct thing to do.