mysqlsqldjangodjango-querysetdjango-subquery

Summarized Subquery annotation with a filter by ForeignKey outer reference


I try to code the equivalent Django query from this SQL query, but I'm stuck. Any help is welcome. I receive a race id and from this race I want to do some statistics : nb_race = the number of races from a Horse before the race given, best_chrono = best time from a Horse before the race given.

SELECT *, (SELECT count(run.id)
                FROM runner run
                INNER JOIN race
                ON run.race_id = race.id
                WHERE run.horse_id = r.horse_id
                AND race.datetime_start < rc.datetime_start 
                ) AS nb_race, 
          (SELECT min(run.chrono)
                FROM runner run
                INNER JOIN race
                ON run.race_id = race.id
                WHERE run.horse_id = r.horse_id
                AND race.datetime_start < rc.datetime_start 
                ) AS best_time
FROM runner r, race rc
WHERE r.race_id = rc.id
AND rc.id = 7890

Django Models:

class Horse(models.Model):
    id = AutoField(primary_key=True)
    name = models.CharField(max_length=255, blank=True, null=True, default=None)

class Race(models.Model):
    id = AutoField(primary_key=True)
    datetime_start = models.DateTimeField(blank=True, null=True, default=None)
    name = models.CharField(max_length=255, blank=True, null=True, default=None)

class Runner(models.Model):
    id = AutoField(primary_key=True)
    horse = models.ForeignKey(Horse, on_delete=models.PROTECT)
    race = models.ForeignKey(Race, on_delete=models.PROTECT)
    chrono = models.DecimalField(max_digits=10, decimal_places=2, blank=True, null=True, default=None)

Solution

  • Subquery expression can be used to compile an additional queryset as subquery that depends on the main queryset and execute them together as one SQL.

    from django.db.models import OuterRef, Subquery, Count, Min, F
    
    # prepare a repeated expression about previous runners, but don't execute it yet
    prev_run = (
        Runner.objects
        .filter(
            horse=OuterRef('horse'),
            race__datetime_start__lt=OuterRef('race__datetime_start'))
        .values('horse')
    )
    queryset = (
        Runner.objects
        .values('id', 'horse_id', 'race_id', 'chrono', 'race__name', 'race__datetime_start')
        .annotate(
            nb_race=Subquery(prev_run.annotate(nb_race=Count('id')).values('nb_race')),
            best_time=Subquery(prev_run.annotate(best_time=Min('chrono')).values('best_time'))
        )
    )
    

    Some tricks used here are described in the linked docs:

    It is compiled to a query equivalent to the SQL in the example. Check the SQL:

    >>> print(str(queryset.query))
    SELECT ...,
      (SELECT COUNT(U0.id)
       FROM runner U0 INNER JOIN race U1 ON (U0.race_id = U1.id)
       WHERE (U0.horse_id = runner.horse_id AND U1.datetime_start < race.datetime_start)
       GROUP BY U0.horse_id
       ) AS nb_race,
       ...
    FROM runner INNER JOIN race ON (runner.race_id = race.id)
    

    A marginal difference is that a subquery uses some internal aliases like U0 and U1.