djangodjango-modelsdjango-3.2

Why does a Django foreign key id __in query fail to match None?


When filtering a queryset on a nullable foreign key, I can filter by an ID value (foo_id=123) or by None (foo_id=None). However, if I attempt to filter by a list (foo_id__in=[123, None]) the None is ignored.

Why is this happening, and what is the best workaround for filtering on a foreign key using a list which includes None?

Example:

from django.db import models

class Foo(models.Model):
  name = models.CharField(max_length=100)

class Bar(models.Model):
  foo = models.ForeignKey(Foo, on_delete=models.PROTECT,
                          blank=True, null=True)
foo = Foo.objects.create(name='myfoo')
Bar.objects.create(foo=foo)
Bar.objects.create(foo=None)

Bar.objects.count()                                    # 2
Bar.objects.filter(foo_id=foo.id).count()              # 1
Bar.objects.filter(foo_id=None).count()                # 1
Bar.objects.filter(foo_id__in=[foo.id, None]).count()  # 1 - Expected 2!

Solution

  • The key insight is that in SQL, NULL represents an unknown value, which cannot be compared using normal operators:

    SELECT NULL = NULL;
    -- => NULL
    

    (This syntax does not work in all DB engines - for example, SQL Server - and in those you would have to write something like SELECT CASE WHEN NULL = NULL THEN 't' ELSE 'f' END, but the result is same: NULL = NULL evaluates as NULL, which is falsy.)

    The reasoning is, for example, if you have two people whose last name you don't know, you'd mark them as NULL - and just because they are both NULL, you can't conclude that they have the same last name (just like you can't conclude they have a different one - you just don't know one way or the other).

    So, a NULL is not equal to another NULL... but it's also not different than another NULL: NULL <> NULL also returns NULL. In fact, NULL infects all operators: 1 + NULL, 1 < NULL, 1 >= NULL... all result in NULL. If you do anything with an unknown value, the result is an unknown value.

    There is basically only one operator that avoids this infectious nature of NULL, and that is IS NULL:

    SELECT NULL IS NULL;
    -- => t
    

    In the same vein as x = NULL, x IN (NULL) is also using the same equality comparison, which can never evaluate as true:

    SELECT 2 IN (1, NULL);
    -- => NULL
    SELECT NULL IN (1, NULL);
    -- => NULL
    SELECT 1 IN (1, NULL);
    -- => t
    
    SELECT 2 NOT IN (1, NULL);
    -- => NULL
    SELECT NULL NOT IN (1, NULL);
    -- => NULL
    SELECT 1 NOT IN (1, NULL);
    -- => t
    

    Is 2 there? Maaaaybe; I have a value that I don't know what it is, so I can't say whether 2 is in there or not, because it just might match that unknown one. How about this other thing that I have no idea about? Beats me, might be 1, or might be equal to the other unknown thing - or it might be something different altogether. How about 1? Well, as to that, I can see 1 in there, regardless of what the unknown thing is or isn't.

    So instead of plain IN, you have to check for NULL explicitly:

    SELECT * WHERE foo_id IN (1, 2) OR foo_id IS NULL;
    

    In django terms:

    from django.db.models import Q
    Bar.objects.filter(Q(foo_id=foo.id) | Q(foo_id__isnull=True)).count()
    

    or if you have several values

    Bar.objects.filter(Q(foo_id__in=[1, 2]) | Q(foo_id__isnull=True)).count()