djangodjango-aggregationdjango-annotatedjango-subquery

Django 1.11 Annotating a Subquery Aggregate


This is a bleeding-edge feature that I'm currently skewered upon and quickly bleeding out. I want to annotate a subquery-aggregate onto an existing queryset. Doing this before 1.11 either meant custom SQL or hammering the database. Here's the documentation for this, and the example from it:

from django.db.models import OuterRef, Subquery, Sum
comments = Comment.objects.filter(post=OuterRef('pk')).values('post')
total_comments = comments.annotate(total=Sum('length')).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))

They're annotating on the aggregate, which seems weird to me, but whatever.

I'm struggling with this so I'm boiling it right back to the simplest real-world example I have data for. I have Carparks which contain many Spaces. Use Book→Author if that makes you happier but —for now— I just want to annotate on a count of the related model using Subquery*.

spaces = Space.objects.filter(carpark=OuterRef('pk')).values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))

This gives me a lovely ProgrammingError: more than one row returned by a subquery used as an expression and in my head, this error makes perfect sense. The subquery is returning a list of spaces with the annotated-on total.

The example suggested that some sort of magic would happen and I'd end up with a number I could use. But that's not happening here? How do I annotate on aggregate Subquery data?

Hmm, something's being added to my query's SQL...

I built a new Carpark/Space model and it worked. So the next step is working out what's poisoning my SQL. On Laurent's advice, I took a look at the SQL and tried to make it more like the version they posted in their answer. And this is where I found the real problem:

SELECT "bookings_carpark".*, (SELECT COUNT(U0."id") AS "c"
FROM "bookings_space" U0
WHERE U0."carpark_id" = ("bookings_carpark"."id")
GROUP BY U0."carpark_id", U0."space"
)
AS "space_count" FROM "bookings_carpark";

I've highlighted it but it's that subquery's GROUP BY ... U0."space". It's retuning both for some reason. Investigations continue.

Edit 2: Okay, just looking at the subquery SQL I can see that second group by coming through ☹

In [12]: print(Space.objects_standard.filter().values('carpark').annotate(c=Count('*')).values('c').query)
SELECT COUNT(*) AS "c" FROM "bookings_space" GROUP BY "bookings_space"."carpark_id", "bookings_space"."space" ORDER BY "bookings_space"."carpark_id" ASC, "bookings_space"."space" ASC

Edit 3: Okay! Both these models have sort orders. These are being carried through to the subquery. It's these orders that are bloating out my query and breaking it.

I guess this might be a bug in Django but short of removing the Meta-order_by on both these models, is there any way I can unsort a query at querytime?


*I know I could just annotate a Count for this example. My real purpose for using this is a much more complex filter-count but I can't even get this working.


Solution

  • It's also possible to create a subclass of Subquery, that changes the SQL it outputs. For instance, you can use:

    class SQCount(Subquery):
        template = "(SELECT count(*) FROM (%(subquery)s) _count)"
        output_field = models.IntegerField()
    

    You then use this as you would the original Subquery class:

    spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
    Carpark.objects.annotate(space_count=SQCount(spaces))
    

    You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.