mysql sql django django-queryset django-subquery

Summarized Subquery annotation with a filter by ForeignKey outer reference

I try to code the equivalent Django query from this SQL query, but I'm stuck. Any help is welcome. I receive a race id and from this race I want to do some statistics : nb_race = the number of races from a Horse before the race given, best_chrono = best time from a Horse before the race given.

SELECT *, (SELECT count(run.id)
                FROM runner run
                INNER JOIN race
                ON run.race_id = race.id
                WHERE run.horse_id = r.horse_id
                AND race.datetime_start < rc.datetime_start 
                ) AS nb_race, 
          (SELECT min(run.chrono)
                FROM runner run
                INNER JOIN race
                ON run.race_id = race.id
                WHERE run.horse_id = r.horse_id
                AND race.datetime_start < rc.datetime_start 
                ) AS best_time
FROM runner r, race rc
WHERE r.race_id = rc.id
AND rc.id = 7890

Django Models:

class Horse(models.Model):
    id = AutoField(primary_key=True)
    name = models.CharField(max_length=255, blank=True, null=True, default=None)

class Race(models.Model):
    id = AutoField(primary_key=True)
    datetime_start = models.DateTimeField(blank=True, null=True, default=None)
    name = models.CharField(max_length=255, blank=True, null=True, default=None)

class Runner(models.Model):
    id = AutoField(primary_key=True)
    horse = models.ForeignKey(Horse, on_delete=models.PROTECT)
    race = models.ForeignKey(Race, on_delete=models.PROTECT)
    chrono = models.DecimalField(max_digits=10, decimal_places=2, blank=True, null=True, default=None)

Solution

Subquery expression can be used to compile an additional queryset as subquery that depends on the main queryset and execute them together as one SQL.

from django.db.models import OuterRef, Subquery, Count, Min, F

# prepare a repeated expression about previous runners, but don't execute it yet
prev_run = (
    Runner.objects
    .filter(
        horse=OuterRef('horse'),
        race__datetime_start__lt=OuterRef('race__datetime_start'))
    .values('horse')
)
queryset = (
    Runner.objects
    .values('id', 'horse_id', 'race_id', 'chrono', 'race__name', 'race__datetime_start')
    .annotate(
        nb_race=Subquery(prev_run.annotate(nb_race=Count('id')).values('nb_race')),
        best_time=Subquery(prev_run.annotate(best_time=Min('chrono')).values('best_time'))
    )
)

Some tricks used here are described in the linked docs:

Output fields of the Subquery must be restricted by .values(...) to one field: only the aggregated value
A subquery must be a queryset (that is evaluated lazy and combined together), not a value (that would be evaluated immediately and fail). Therefore .annotate() is used in the subquery (not .aggregate()). That adds a GROUP BY race.horse_id, but is is not a problem becase there is also WHERE race.horse_id = ... and the "group by" will be finally ignored by an SQL optimizer in a modern database backend.

It is compiled to a query equivalent to the SQL in the example. Check the SQL:

>>> print(str(queryset.query))
SELECT ...,
  (SELECT COUNT(U0.id)
   FROM runner U0 INNER JOIN race U1 ON (U0.race_id = U1.id)
   WHERE (U0.horse_id = runner.horse_id AND U1.datetime_start < race.datetime_start)
   GROUP BY U0.horse_id
   ) AS nb_race,
   ...
FROM runner INNER JOIN race ON (runner.race_id = race.id)

A marginal difference is that a subquery uses some internal aliases like U0 and U1.