sqlredash

Why is "AND NOT field = 'value'" filtering out nulls?


My query looks basically like

WITH DATA AS(
 SELECT fields
    FROM table
        WHERE [many conditions]
        AND NOT field1 = 'string'           
)

SELECT foo,
    bar,
    CASE
        WHEN field1 IS NULL THEN 'other_string'
        [other cases]
    END
FROM data

There's plenty of other stuff going on, but this is the core bit that isn't working. This should give me plenty of results. but instead, that particular case has zero results; if i cut out the rest of the query and run basically just this, it's just an empty table.

AND NOT field1 = 'string' seems to be filtering out all rows where field1 is 'string', but ALSO all rows where field1 is null. If field1 is null, then field1 = 'string' should evaluate false (since null doesn't equal anything) and therefore NOT field1 = 'string' should evaluate true and those rows should be in the query—right?

Not actually sure what sql implementation is running behind the Redash frontend I'm using but I can find out if it's relevant.


Solution

  • Almost any comparison with NULL returns NULL. Two prominent exceptions are IS NULL and IS NOT NULL, which return either "true" or "false".

    NULL has pretty simple semantics:

    The semantics are easier to follow if you think of NULL as representing an *unknown" value rather than a missing value.

    A WHERE clause only passes through rows that evaluate to "true". So, both "false" and NULL are removed. Note that this is the same for CASE expressions. But, CHECK expressions treat NULL the same as "true". Well, no one ever accused SQL of internal consistency.

    Your expression is:

    WHERE [many conditions] AND NOT field1 = 'string'    
    

    When field1 is NULL this is:

    WHERE [many conditions] AND NOT (NULL = 'string')
    WHERE [many conditions] AND NOT (NULL)
    WHERE [many conditions] AND NULL
    WHERE NULL
    

    Very simple logic.

    Note: Standard SQL has a NULL safe comparator, which would be:

    WHERE [many conditions] AND field1 IS DISTINCT FROM 'string'
    

    Some databases use <=> for this purpose:

    WHERE [many conditions] AND NOT field1 <=>'string'
    

    And in others you need to be more explicit:

    WHERE [many conditions] AND (field1 <> 'string' OR field1 IS NULL)