sqlpostgresqlnullin-clause

SQL and NOT IN with NULLs


I was actually trying some thing similar with a sub query, and couldn’t understand what was going wrong. I have managed to simplify the question to the following.

I have a simple table which may include a NULL in one of the columns:

DROP TABLE IF EXISTS data;
CREATE TEMP TABLE data (
    id INT PRIMARY KEY,
    number INT,
    string TEXT
);
INSERT INTO data(id,number,string)
VALUES (1,1,'Apple'),(2,1,'Accordion'),(3,2,'Banana'),(4,2,'Banjo'),(5,NULL,'Whatever');
SELECT * FROM data WHERE number IN(1,2,NULL);
SELECT * FROM data WHERE number NOT IN(1,2,NULL);

There is a live version at https://dbfiddle.uk/KhTzbX_E .

When I look for rows matching the number column:

SELECT * FROM data WHERE number IN(1,2,NULL);

I get a few results, as expected. This doesn’t include the row where number is NULL, but I suppose that the IN expression is short for WHERE a = b.

If I look for the non-matches:

SELECT * FROM data WHERE number NOT IN(1,2,NULL);

I get nothing at all.

I can’t see how that can be right. The expression IN(1,2,NULL) must return a valid list otherwise the first one wouldn’t work.

What is going on here, and is there a correct way to do this?

Note: I know it’s silly to put in the NULL, but the idea is that the list is supposed to be a sub query which might return a vew NULLs. I also know that I can filter out the NULLs in the sub query. However that looks like a workaround to me.

I have tried this in PostgreSQL, MariaDB and Microsoft SQL Server.


Solution

  • This is an old trap for people who are not experienced with NULL: if a NOT IN list contains NULL, the result set is always empty.

    To understand why, let's rewrite

    WHERE number NOT IN (1, 2, NULL)
    

    to the semantically equivalent

    WHERE number <> 1 AND number <> 2 AND number <> NULL
    

    Then note that number <> NULL will always return NULL. This is easiest to understand if you think of “NULL” as “unknown”: when asked if an unknown number is different from any given number, the answer could be “true” or “false”, depending on the unknown value of the unknown number. So the answer must be “unknown”, which is the boolean value NULL.

    Now something AND NULL can be FALSE (if something is FALSE) or NULL (if something is NULL or TRUE), but it never can be TRUE. And a WHERE condition only passes rows where the condition is TRUE. Both FALSE and NULL won't pass.