sql-servert-sql

T-SQL Case expression strange behavior with newid() as randomness source


I'm using SQL Server 2012.

If I do the following to get a list of random-ish numbers in the range [1,3], it works just fine:

SELECT TOP 100 
    ABS(CHECKSUM(NEWID()))%3 + 1 [value_of_rand]
FROM sys.objects

and I get nice things like this (all between 1 and 3).

3
2
2
2
1
....etc.

But if I then put the output of that same chained-random-value function into a CASE expression, it apparently does not produce only the values 1,2,3.

SELECT TOP 100 
    CASE (ABS(CHECKSUM(NEWID()))%3 + 1)
        WHEN 1
            THEN 'one'
        WHEN 2
            THEN 'two'
        WHEN 3
            THEN 'three'
        ELSE
            'that is strange'
    END [value_of_case]
FROM sys.objects

It outputs:

three
that is strange
that is strange
one
two
...etc

What am I doing wrong here?


Solution

  • Your:

    SELECT TOP 100 
        CASE (ABS(CHECKSUM(NEWID()))%3 + 1)
            WHEN 1
                THEN 'one'
            WHEN 2
                THEN 'two'
            WHEN 3
                THEN 'three'
            ELSE
                'that is strange'
        END [value_of_case]
    FROM sys.objects
    

    Actually executed:

    SELECT TOP 100 
        CASE 
            WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 1  THEN 'one'
            WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 2  THEN 'two'
            WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 3  THEN 'three'
            ELSE 'that is strange'
        END [value_of_case]
    FROM sys.objects;
    

    Basically your expression is non-deterministic and each time is evaluated so you can end up with ELSE clause. So there is no bug or catch, just you use it with variable expression and it is perfectly normal behavior.

    This is the same class like COALESCE syntactic-sugar

    The COALESCE expression is a syntactic shortcut for the CASE expression. That is, the code COALESCE(expression1,...n) is rewritten by the query optimizer as the following CASE expression:

    CASE

    WHEN (expression1 IS NOT NULL) THEN expression1

    WHEN (expression2 IS NOT NULL) THEN expression2

    ...

    ELSE expressionN

    END

    This means that the input values (expression1, expression2, expressionN, etc.) will be evaluated multiple times. Also, in compliance with the SQL standard, a value expression that contains a subquery is considered non-deterministic and the subquery is evaluated twice. In either case, different results can be returned between the first evaluation and subsequent evaluations.

    EDIT:

    Solution: SqlFiddle

    SELECT TOP 100 
        CASE t.col
            WHEN 1     THEN 'one'
            WHEN 2     THEN 'two'
            WHEN 3     THEN 'three'
            ELSE      'that is strange'
        END [value_of_case]
    FROM sys.objects
    CROSS APPLY ( SELECT ABS(CHECKSUM(NEWID()))%3 + 1 ) AS t(col)