sqlsql-serverstringisnullnullif

SQL Server : ISNULL(compound NULL condition, 'a string') returns only the 1st character, under certain circumstance(s)


I'm a self-taught, vaguely competent SQL user. For a view that I'm writing, I'm trying to develop a 'conditional LEFT' string-splitting command (presumably later to be joined by a 'conditional RIGHT' - whereby:

So, if our pattern is ' - ',

Rather than using the crudest ways to do this, I'm trying to come up with a way that avoids having to repeat any clause (such as if 0 < CHARINDEX, then take CHARINDEX - 1, etc.) and instead leverages conditional NULLing.

Yet - here's what I get for trying to be creative - I've hit what seems to be a really basic stumbling block. Please observe the following code and results, and let me know whether you can replicate it - and hence whether it's a bug or I've missed something peculiar. I have tested this on SQL Server both 2008 R2 and 2014, both Express editions.

select
    -- ISNULL: returns 'a big old string'
    ISNULL(null, 'a big old string'),

    -- NULLIF: returns NULL
    left(
        'a big old string',
        nullif
        (
            CHARINDEX
            (
                'needle',
                'haystack'
            ), 0
        ) - 1
    ),

    -- combined: returns just 'a' (1st character of ISNULL condition)
    ISNULL(
        left
        (
            'a big old string', -- the input string. In reality, this would be a column alias, etc.
            nullif
            (
                CHARINDEX       -- Search for the splitting pattern
                (
                    'needle',
                    'haystack'
                ), 0            -- If it's not found, return NULL instead of the usual 0
            ) - 1               -- so that this subtraction produces a NULL, not an invalid negative index
        ),
        'a big old string'      -- If the pattern was not found, we should return the input unaltered
    );

/*
---------------- ---- ----
a big old string NULL a

(1 row(s) affected)
*/

Why do these 2 clauses work as expected in isolation, but when I combine them, rather than getting the sum of their effects, I get only the 1st character of the ISNULL string - 'a'?

Is there some kind of implicit CAST to varchar(1)? Deliberately casting to varchar(max) made no difference. What else could be going on here?

Am I just doing something really stupid? Because from here, I can't figure out what I'm doing wrong, and so it really seems like a bug. I hoped testing on 2014 would prove it to be a bug in the old 2008 R2, but alas, they act identically (or, rather, don't).

Thanks in advance for, hopefully, saving me from what would presumably be an evening of baffled existential crisis.


Solution

  • There are two parts to this problem, the first is the nature of the ISNULL operator, it will use the datatype and length of the first argument. A simple example would be:

    DECLARE @A CHAR(1) = NULL,
            @B VARCHAR(MAX) =  'This is a test';
    
    SELECT TOP 1 Test = ISNULL(@A, @B);
    

    This returns T and checking the execution plan XML we can see the implicit conversion of "This is a Test" to CHAR(1):

    <ScalarOperator ScalarString="isnull([@A],CONVERT_IMPLICIT(char(1),[@B],0))">
        <Intrinsic FunctionName="isnull">
        <ScalarOperator>
            <Identifier>
            <ColumnReference Column="@A" />
            </Identifier>
        </ScalarOperator>
        <ScalarOperator>
            <Convert DataType="char" Length="1" Style="0" Implicit="true">
            <ScalarOperator>
                <Identifier>
                <ColumnReference Column="@B" />
                </Identifier>
            </ScalarOperator>
            </Convert>
        </ScalarOperator>
        </Intrinsic>
    </ScalarOperator>
    

    Your example is not quite so straightforward since you don't have your types nicely defined like above, but if we do define the dataypes:

    DECLARE @A VARCHAR(MAX) =  'a big old string',
            @B VARCHAR(MAX) = 'needle',
            @C VARCHAR(MAX) = 'haystack';
    
    SELECT TOP 1 ISNULL(LEFT(@A, NULLIF(CHARINDEX(@B, @C), 0) - 1), @A);
    

    We get the result as expected. So something else is happening under the hood. The query plan does not delve into the inner workings of the constant evaluation, but the following demonstrates what is happening:

    SELECT  Test = LEFT('a big old string', NULLIF(CHARINDEX('needle', 'haystack'), 0) - 1)
    INTO    #T;
    
    SELECT  t.name, c.max_length
    FROM    tempdb.sys.columns AS c
            INNER JOIN sys.types AS t
                ON t.system_type_id = c.system_type_id
                AND t.user_type_id = c.user_type_id
    WHERE   [object_id] = OBJECT_ID(N'tempdb..#T');
    
    ----------------
    name        max_length
    varchar     1
    

    Basically, by using the SELECT INTO sytax with your left expression shows that a when the NULL length is passed to LEFT the resulting datatype is VARCHAR(1), however, this is not always the case. If I simply hard code NULL into the LEFT function:

    SELECT  Test = LEFT('a big old string', NULL)
    INTO    #T;
    
    --------------------
    name        max_length
    varchar     16
    

    Then you get the length of the string passed, but a case expression that should be optimised away to the same thing, yields a length of 1 again:

    SELECT  TOP 1 Test = LEFT('a big old string', CASE WHEN 1 = 1 THEN NULL ELSE 1 END)
    INTO    #T;
    
    ----------------
    name        max_length
    varchar     1
    

    I suspect it is related to the default behaviour of VARCHAR, where the default length is 1, e.g:

    DECLARE @A VARCHAR = 'This is a Test';
    
    SELECT  Value = @A,                                         -- T
            MaxLength = SQL_VARIANT_PROPERTY(@A, 'MaxLength')   -- 1
    

    But I can't tell you why you would see different behaviour for NULL and CASE WHEN 1 = 1 THEN NULL ELSE 1 END. If you wanted to get the bottom of what is going on in the constant evaluation I think you would probably need to re-ask on the DBA site and hope that one of the real SQL Server Gurus picks it up.

    In summary, LEFT(<constant>, <constant expression>) where <constant expression> yields NULL is implicitly typed as VARCHAR(1), and this implicit type is used in ISNULL evaluation.

    For what it is worth, if you explicitly type the result of your LEFT function then you get the expected result:

    SELECT ISNULL(
                CAST(
                    LEFT(
                        'a big old string', 
                        NULLIF(CHARINDEX('needle', 'haystack'), 0) - 1
                        ) 
                    AS VARCHAR(MAX))
                    , 'a big old string');
    

    An additional point is that when you say you don't want to repeat any expressions (If 0 < CHARINDEX, then take CHARINDEX - 1, etc.), there are two things you should know, the first is that NULLIF(<expression>, <value>) expands to a case expression - CASE WHEN <expression> = <value> THEN NULL ELSE <expression> END, so is repeated, the second is that this doesn't matter, SQL Server can identify that this is the same expression used twice, and will evaluate it once and refer to the same result each time it is used.