postgresqlouter-join

What are the semantics of "from_item is not null" / "from_item is null"?


Given the following query

select *
from jsonb_to_recordset('[{"a":1},{"a":2,"b":42}]') i(a int, b int)
where i is not null

I observe the result of a single row, which is the one where both columns are non-null.

When negating the where clause to be where i is null, I instead observe the empty result set.

I'm somewhat confused by the first version of that query not resulting in both rows, and by negating the comparison not yielding the complement of the set returned by the first query.

So far I've not been able to find the right place in the documentation where this behaviour is described, and would appreciate any pointers in the right direction to better understand what's happening.

For context, this came up for me when trying to check whether or not there was a matching row in an outer join. To avoid the above somewhat surprising behaviour, I opted to instead check for one of the non-nullable columns of the joined relation to be null/not null, which fixed my immediate problem, but makes me wonder if there's perhaps a better way to perform this kind of check that'd always work, even if all the columns of the relation you're joining are nullable, and without having to enumerate all of the columns and relying on at least one of them to be non-null.


Solution

  • This is documented at https://www.postgresql.org/docs/current/functions-comparison.html for row types (also known as composite types):

    If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return inverse results for row-valued expressions; in particular, a row-valued expression that contains both null and non-null fields will return false for both tests. In some cases, it may be preferable to write row IS DISTINCT FROM NULL or row IS NOT DISTINCT FROM NULL, which will simply check whether the overall row value is null without any additional tests on the row fields.

    You can generate yourself a cheat sheet using

    SELECT
      x, y.a, b, y,
      y IS NULL AS "y IS NULL",
      y IS NOT NULL AS "y IS NOT NULL",
      y IS DISTINCT FROM NULL AS "y IS DISTINCT FROM NULL"
    FROM (VALUES ('one'), ('two')) AS x(a)
    FULL OUTER JOIN jsonb_to_recordset('[{}, {"a":"one"}, {"b": 1}, {"a":"one","b":2}]') AS y(a text, b int) USING (a)
    ORDER BY num_nonnulls(y, y.a, y.b);
    

    (online demo)

    So for checking whether a row got matched in an outer join, you should use IS DISTINCT FROM NULL.