djangopostgresqlannotationsdjango-queryset

How can I annotate my Django queryset with a count of related objects


I have 3 django models; Machine, Component, and Part.

I am trying to annotate a machine queryset to include the count of unique parts used in that machine.

In the example below, ps1 is used to make f1. ps1 and ps2 are used to make f2. f1 and f2 are used to make r1. Therefore, 2 unique parts are used to create r1 and this is the value I want to use to annotate r1. So far my code is outputting 3 in this case.

    # |------------+---------------------+----------------+--------------+--------------|
    # | machine    | component           | part           | expected     | result       |
    # |            |                     |                | parts        | parts        |
    # |------------+---------------------+----------------+--------------+--------------|
    # | r1         | f1                  | ps1            | 2            | 3            |
    # |            | f2                  | ps1            |              |              |
    # |            |                     | ps2            |              |              |
    # |------------+---------------------+----------------+--------------+--------------|
# part.py
components = models.ManyToManyField("Component", related_name="parts")
# component.py
machine_engine = models.ForeignKey("Machine", related_name="engine_components")
machine_transmission = models.ForeignKey("Machine", related_name="transmission_components")

This is the function I am working with to try and accomplish this.

def annotate_machine_query_with_num_parts(machine_query):
    count_subquery = (
        Parts.objects.filter(pk=OuterRef("pk"))
        .annotate(count=Count("id", distinct=True))
        .values("count")
    )

    sum_subquery = (
        Parts.objects.filter(
            Q(component__machine_id=OuterRef("pk"))
        )
        .annotate(count=Subquery(count_subquery))
        .values("count")
        .annotate(num_parts=Sum("count"))
        .values("num_parts")
    )

    return machine_query.annotate(num_parts=Coalesce(sum_subquery, 0))

This works for many test cases but fails when a part is reused by multiple components, as is the case with ps1 here.

I have tried a few things which have resulted in subquery must return a single row errors. I am sure there must be a simpler way to achieve this.

edit:

I have updated the sample from component.py above. I actually have several ForeignKeys from component to machine which define in what way the component is used in the machine.

Based on the comments, I figured I would be able to do the following to handle this:

    machine_query.annotate(
        num_parts=(
            Count("engine_components__parts"), distinct=True) +
            Count("transmission_components__parts"), distinct=True)
        )
    )

The issue I am facing is that a component can be used in both the engine and the transmission, which is causing that components parts to be double counted, because the parts actually exist as distinct in each of the counts.

Is there a solution that can traverse all the relationships, collect the parts, and then apply the distinct before counting?


Solution

  • I think you are overcomplicating things. You can work with distinct=True [Django-doc]:

    from django.db.models import Count
    
    Machine.annotate(num_parts=Count('component__parts', distinct=True))

    Note: The related_name=… [Django-doc] is the name of the manager to fetch the related objects in reverse. Therefore normally the related_name of a ForeignKey or ManyToManyField is plural, for example components instead of component.