sql sqlite random common-table-expression

Why does RANDOM() in a SQLite CTE JOIN behave differently to other RDBMSs?

RANDOM() values in a Common Table Expression (CTE) join aren't behaving as expected in SQLite.

SQL:

WITH
  tbl1(n) AS (SELECT 1 UNION ALL SELECT 2),
  tbl2(n, r) AS (SELECT n, RANDOM() FROM tbl1)
SELECT * FROM tbl2 t1 CROSS JOIN tbl2 t2;

Sample SQLite results:

n   r                       n   r
1   7058971975145008000     1   8874103142384122000
1   1383551786055205600     2   8456124381892735000
2   2646187515714600000     1   7558324128446983000
2   -1529979429149869800    2   7003770339419606000

The random numbers in each column are all different. But a CROSS JOIN repeats rows - so I expected 2 pairs of the same number in each column - which is the case in PostgreSQL, Oracle 11g and SQL Server 2014 (when using a row-based seed).

Sample PostgreSQL / Oracle 11g / SQL Server 2014 results:

n   r                   n   r
1   0.117551110684872   1   0.117551110684872
1   0.117551110684872   2   0.221985165029764
2   0.221985165029764   1   0.117551110684872
2   0.221985165029764   2   0.221985165029764

Questions

Can the behaviour in SQLite be explained? Is it a bug?
Is there a way for Table B in a CTE (based on Table A in the same CTE) to have an additional column of randomly generated numbers, which will remain fixed when used in a JOIN?

Solution

Your question is rather long and rambling -- not a single question. But, it is interesting and I learned something.

This statement is not true:

SQL Server assigns a random seed to the RAND() function: When used in a SELECT, it is only seeded once rather than for each row.

SQL Server has the concept of run-time constant functions. These are functions that are pulled from the compiled query and executed once per expression at the beginning of the query. The most prominent examples are getdate() (and related date/time functions) and rand().

You can readily see this if you run:

select rand(), rand()
from (values (1), (2), (3)) v(x);

Each column has the same values, but the values between the columns are different.

Most databases -- including SQLite -- have the more intuitive interpretation of rand()/random(). (As an personal note, a "random" function that returns the same value on each row is highly counter-intuitive.) Each time it is called you get a different value. For SQL Server, you would typically use an expression using newid():

select rand(), rand(), rand(checksum(newid()))
from (values (1), (2), (3)) v(x);

As for your second question, it appears that SQLite materializes recursive CTEs. So this does what you want:

WITH tbl1(n) AS (
      SELECT 1 UNION ALL SELECT 2
     ),
     tbl2(n, r) AS (
       SELECT n, RANDOM()
       FROM tbl1
       union all
       select *
       from tbl2
       where 1=0
      )
SELECT *
FROM tbl2 t1 CROSS JOIN tbl2 t2;

I have seen no documentation that this is the case, so use at your own risk. Here is a DB-Fiddle.

And, for the record, this seems to work in SQL Server as well. I just learned something!

EDIT:

As suggested in the comment, the materialization may not always happen. It does seem to apply to two references at the same level:

WITH tbl1(n) AS (
      SELECT 1 UNION ALL SELECT 2),
     tbl2(n, r) AS (
       SELECT n, RANDOM()
       FROM tbl1
       union all
       select *
       from tbl2
       where 1=0
      )
SELECT t2a.r, count(*)
FROM tbl2 t2a left JOIN
     tbl2 t2b
     on t2a.r = t2b.r
GROUP BY t2a.r;