sqliteselectgroup-bywhere-clausesql-null

Understanding how NULL is treated in GROUP BY versus the WHERE clause?


While trying to isolate the 7 rows in the first query below, I got results that, as a novice, I was not expecting. I read over this SQLite document but do not understand why the 7 rows which are NULL are separated in the GROUP BY in the first query, but testing on != '', not in ('A','H'), and ='' all exclude the NULL rows.

It seems as though the tests are exclusive, such that NULL is either ='' or !='', or in ('A','H') or not in ('A','H'). It appears to be ignored by all of these, yet is separated in the GROUP BY.

Would you please explain why it works this way?

Thank you.

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        group by c
        order by cnt;
c  cnt   
-  ------
   7     
A  4828  
   20046 
H  300679

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        where c != ''
        group by c
        order by cnt;
c  cnt   
-  ------
A  4828  
H  300679

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        where c not in ('A', 'H')
        group by c
        order by cnt;
c  cnt   
-  ------
   20046 

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        where c = ''
        group by c
        order by cnt;
c  cnt   
-  ------
   20046 

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        where c is null
        group by c
        order by cnt;
c  cnt   
-  ------
   7     


Solution

  • A boolean expression in SQL evaluates as true or false or null.

    The WHERE clause of any SQL statement filters out all rows for which the boolean expression/condition is not true, meaning false and null are filtered out.

    All these boolean expressions:

    null != ''
    null = ''
    null not in ('A', 'H')
    

    are evaluated as null (demo), because any comparison of/to null without the use of the operator IS returns null.

    This is why your 2nd, 3d and 4th queries filter out not only the rows that don't satisfy the condition in the WHERE clause but also the rows where c is null.

    If you want these rows where c is null you must explicitly mention that:

    c != '' OR c IS NULL
    c = '' OR c IS NULL
    c not in ('A', 'H') OR c IS NULL
    

    Or, for the first 2 cases you can use the operator IS:

    c IS NOT ''
    c IS ''