High-level problem: I want to update an existing table to fill an existing column with a randomly generated 32-bytes, base64-encoded. The random data should be different for each row.
Ignoring for a moment the base64 encoding requirement, the solution is straightforward, as illustrated in this sample code:
DECLARE @table TABLE (
id int,
bin varbinary(max) null
)
-- put a few rows in the table
insert into @table (id) values (1)
insert into @table (id) values (2)
insert into @table (id) values (3)
-- perform the update
update @table
set bin = CRYPT_GEN_RANDOM(32)
-- check result
select *
from @table
This works as expected. CRYPT_GEN_RANDOM(32)
generates a different value for each updated row. Now try to add the base64 encoding requirement:
DECLARE @table TABLE (
id int,
txt nvarchar(max) null
)
-- put a few rows in the table
insert into @table (id) values (1)
insert into @table (id) values (2)
insert into @table (id) values (3)
-- perform the update
update @table
set txt = (SELECT CRYPT_GEN_RANDOM(32) FOR XML PATH(''), BINARY BASE64)
-- check result
select *
from @table
This doesn't work: it puts the same value in every row. I tried packaging the base64 encoding into a UDF, to see if that would help:
CREATE FUNCTION ConvertBytesToBase64
(
@bytes varbinary(max)
)
RETURNS nvarchar(max)
AS
BEGIN
DECLARE @result nvarchar(max)
SET @result = (SELECT @bytes FOR XML PATH(''), BINARY BASE64)
RETURN @result
END
GO
And then the update statement becomes:
update @table
set txt = ConvertBytesToBase64(CRYPT_GEN_RANDOM(32))
But this still produces the same value in every row.
What I fundamentally don't understand is, given that SQL Server evaluates CRYPT_GEN_RANDOM(32)
for every row (which seems sensible), why does it not evaluate ConvertBytesToBase64(CRYPT_GEN_RANDOM(32))
for every row? How can I make it evaluate for every row? (and perhaps related, is there a better way to do base64 encoding in SQL Server 2019+ ?)
I fundamentally don't understand why SQL Server evaluates
CRYPT_GEN_RANDOM(32)
for every row, butConvertBytesToBase64(CRYPT_GEN_RANDOM(32))
is only evaluated once.
Scalar UDFs in SQL Server historically were evaluated RBAR.
You can add WITH INLINE = OFF, SCHEMABINDING
to the function definition to disable the inlining of the function otherwise it can end up treated much the same as your original query (the schemabinding is not required to stop the inlining but is there to prevent an unnecessary spool in the UPDATE
plan for Halloween protection).
Regarding your original query...
update @table
set txt = (SELECT CRYPT_GEN_RANDOM(32) FOR XML PATH(''), BINARY BASE64)
the sub query is no longer a simple scalar expression and requires an XML PATH sub tree. The execution plan generation sees it will only return one row so puts the expensive operation on the outside of the nested loops so it is only evaluated once.
I found adding an OPTION (FORCE ORDER)
stopped this from happening, and it was evaluated on the inside of the nested loops.
You could also add in some bogus correlation as in this example (that appends 0x
to the binary value so doesn't change the result but is now a correlated sub query)
update @table
set txt = (SELECT CRYPT_GEN_RANDOM(32) + SUBSTRING(CAST(id AS binary(4)),1,0) FOR XML PATH(''), BINARY BASE64)
Azure has a built in BASE64_ENCODE
function but this is not in the on prem version yet. I did just try
UPDATE @table
SET txt = BASE64_ENCODE(CRYPT_GEN_RANDOM(32))
in Azure and got "per row" evaluation and a different value in each row.