pythondjangodjango-modelsdjango-aggregation

How to filter objects for count annotation in Django?


Consider simple Django models Event and Participant:

class Event(models.Model):
    title = models.CharField(max_length=100)

class Participant(models.Model):
    event = models.ForeignKey(Event, db_index=True)
    is_paid = models.BooleanField(default=False, db_index=True)

It's easy to annotate events query with total number of participants:

events = Event.objects.all().annotate(participants=models.Count('participant'))

How to annotate with count of participants filtered by is_paid=True?

I need to query all events regardless of number of participants, e.g. I don't need to filter by annotated result. If there are 0 participants, that's ok, I just need 0 in annotated value.

The example from documentation doesn't work here, because it excludes objects from query instead of annotating them with 0.

Update. Django 1.8 has new conditional expressions feature, so now we can do like this:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0,
        output_field=models.IntegerField()
    )))

Update 2. Django 2.0 has new Conditional aggregation feature, see the accepted answer below. This also works in Django 3.x


Solution

  • Conditional aggregation in Django 2.0+ allows you to further reduce the amount of faff this has been in the past. This will also use Postgres' filter logic, which is somewhat faster than a sum-case (I've seen numbers like 20-30% bandied around).

    Anyway, in your case, we're looking at something as simple as:

    from django.db.models import Q, Count
    events = Event.objects.annotate(
        paid_participants=Count('participant', filter=Q(participant__is_paid=True))
    )
    

    There's a separate section in the docs about filtering on annotations. It's the same stuff as conditional aggregation but more like my example above. Either which way, this is a lot healthier than the gnarly subqueries I was doing before.

    For more complex annotation filters, it may be more understandable to structure the filter as a separate queryset that is passed to an __in expression:

    from django.db.models import Q, Count
    paid_participants = Participant.objects.filter(is_paid=True)
    events = Event.objects.annotate(
        paid_participants=Count(
            'participant', 
            filter=Q(participant__in=paid_participants)
        )
    )
    

    In this context, the Participant queryset does not need to explicitly filter on the outer Event ID like a subquery because Count (and other aggregation functions) implicitly filter for reverse foreign key relationships of the current row. In other words, with this format specify your filters globally for the Model that is being counted.