djangodjango-modelssubquerydjango-subquery

How can I ensure correct grouping in a Django subquery annotation?


When I'm writing Django Subquery expressions, sometimes I have no idea how Django is going to group the result.

Example:

subquery = (
    RelatedModel.objects.filter(
        category="some_category",
        grandparentmodel=OuterRef("relatedmodel__grandparentmodel"),
        mymodel__created_at__date=OuterRef("created_at__date"),
    )
    .values("grandparentmodel", "mymodel__created_at__date")
    .annotate(net=Sum("mymodel__amount"))
    .values("net")[:1]
)


query = (
    MyModel.objects.annotate(
        net=Coalesce(
            Subquery(subquery), Decimal("0.00")
        )
    )
)

With this, my goal is to group a bunch of ParentModel instances (with category "some_category") by grandparentmodel and mymodel__created_at__date.

This seems to work only if I include the .values("mymodel__created_at__date") before my annotation. If I do not include that before my annotation of net, the subquery still runs and just gives incorrect net. However, the .values("grandparentmodel") seems not required to get the correct grouping; I can exclude it and I still get the values I'm expecting.

What's going on behind the scenes? How would I know to use a .values() call before my annotation to group correctly? How would I know what to include in said values() call? Is there some rule of thumb to follow when aggregating in subqueries?


Solution

  • Usually, the ORM isn't here to construct exact SQL queries. It is more a tool to transcript your needs into the best query it can find.
    However, what you found is the way to force grouping.

    qs = MyModel.objects.all()
    qs = qs.values("foo", "bar")  #  Fields used for grouping
    qs = qs.annotate(res=Func("baz"))  # Aggregation on other fields
    

    On itself .values() will discard the Model layer and change the queryset results the be dicts containing the keys given as parameters.
    Followed by .annotate() containing an aggregation function on a field that is not in the .values() parameters, it will result in a group by. Results will also be dicts