I have 3 django models; Machine
, Component
, and Part
.
I am trying to annotate a machine queryset to include the count of unique parts used in that machine.
In the example below, ps1 is used to make f1
. ps1
and ps2
are used to make f2
. f1
and f2
are used to make r1
. Therefore, 2 unique parts are used to create r1
and this is the value I want to use to annotate r1
. So far my code is outputting 3 in this case.
# |------------+---------------------+----------------+--------------+--------------|
# | machine | component | part | expected | result |
# | | | | parts | parts |
# |------------+---------------------+----------------+--------------+--------------|
# | r1 | f1 | ps1 | 2 | 3 |
# | | f2 | ps1 | | |
# | | | ps2 | | |
# |------------+---------------------+----------------+--------------+--------------|
Part
has a ManyToManyField
of Component
.# part.py
components = models.ManyToManyField("Component", related_name="parts")
Component
has several ForeignKey
s to Machine
.# component.py
machine_engine = models.ForeignKey("Machine", related_name="engine_components")
machine_transmission = models.ForeignKey("Machine", related_name="transmission_components")
This is the function I am working with to try and accomplish this.
def annotate_machine_query_with_num_parts(machine_query):
count_subquery = (
Parts.objects.filter(pk=OuterRef("pk"))
.annotate(count=Count("id", distinct=True))
.values("count")
)
sum_subquery = (
Parts.objects.filter(
Q(component__machine_id=OuterRef("pk"))
)
.annotate(count=Subquery(count_subquery))
.values("count")
.annotate(num_parts=Sum("count"))
.values("num_parts")
)
return machine_query.annotate(num_parts=Coalesce(sum_subquery, 0))
This works for many test cases but fails when a part is reused by multiple components, as is the case with ps1
here.
I have tried a few things which have resulted in subquery must return a single row
errors. I am sure there must be a simpler way to achieve this.
edit:
I have updated the sample from component.py
above. I actually have several ForeignKey
s from component to machine which define in what way the component is used in the machine.
Based on the comments, I figured I would be able to do the following to handle this:
machine_query.annotate(
num_parts=(
Count("engine_components__parts"), distinct=True) +
Count("transmission_components__parts"), distinct=True)
)
)
The issue I am facing is that a component can be used in both the engine and the transmission, which is causing that components parts to be double counted, because the parts actually exist as distinct in each of the counts.
Is there a solution that can traverse all the relationships, collect the parts, and then apply the distinct before counting?
I think you are overcomplicating things. You can work with distinct=True
[Django-doc]:
from django.db.models import Count
Machine.annotate(num_parts=Count('component__parts', distinct=True))
Note: The
related_name=…
[Django-doc] is the name of the manager to fetch the related objects in reverse. Therefore normally therelated_name
of aForeignKey
orManyToManyField
is plural, for examplecomponents
instead of.component