I try to code the equivalent Django query from this SQL query, but I'm stuck.
Any help is welcome.
I receive a race id
and from this race I want to do some statistics : nb_race
= the number of races from a Horse before the race given, best_chrono
= best time from a Horse before the race given.
SELECT *, (SELECT count(run.id)
FROM runner run
INNER JOIN race
ON run.race_id = race.id
WHERE run.horse_id = r.horse_id
AND race.datetime_start < rc.datetime_start
) AS nb_race,
(SELECT min(run.chrono)
FROM runner run
INNER JOIN race
ON run.race_id = race.id
WHERE run.horse_id = r.horse_id
AND race.datetime_start < rc.datetime_start
) AS best_time
FROM runner r, race rc
WHERE r.race_id = rc.id
AND rc.id = 7890
Django Models:
class Horse(models.Model):
id = AutoField(primary_key=True)
name = models.CharField(max_length=255, blank=True, null=True, default=None)
class Race(models.Model):
id = AutoField(primary_key=True)
datetime_start = models.DateTimeField(blank=True, null=True, default=None)
name = models.CharField(max_length=255, blank=True, null=True, default=None)
class Runner(models.Model):
id = AutoField(primary_key=True)
horse = models.ForeignKey(Horse, on_delete=models.PROTECT)
race = models.ForeignKey(Race, on_delete=models.PROTECT)
chrono = models.DecimalField(max_digits=10, decimal_places=2, blank=True, null=True, default=None)
Subquery expression can be used to compile an additional queryset as subquery that depends on the main queryset and execute them together as one SQL.
from django.db.models import OuterRef, Subquery, Count, Min, F
# prepare a repeated expression about previous runners, but don't execute it yet
prev_run = (
Runner.objects
.filter(
horse=OuterRef('horse'),
race__datetime_start__lt=OuterRef('race__datetime_start'))
.values('horse')
)
queryset = (
Runner.objects
.values('id', 'horse_id', 'race_id', 'chrono', 'race__name', 'race__datetime_start')
.annotate(
nb_race=Subquery(prev_run.annotate(nb_race=Count('id')).values('nb_race')),
best_time=Subquery(prev_run.annotate(best_time=Min('chrono')).values('best_time'))
)
)
Some tricks used here are described in the linked docs:
.values(...)
to one field: only the aggregated value.annotate()
is used in the subquery (not .aggregate()
). That adds a GROUP BY race.horse_id
, but is is not a problem becase there is also WHERE race.horse_id = ...
and the "group by" will be finally ignored by an SQL optimizer in a modern database backend.It is compiled to a query equivalent to the SQL in the example. Check the SQL:
>>> print(str(queryset.query))
SELECT ...,
(SELECT COUNT(U0.id)
FROM runner U0 INNER JOIN race U1 ON (U0.race_id = U1.id)
WHERE (U0.horse_id = runner.horse_id AND U1.datetime_start < race.datetime_start)
GROUP BY U0.horse_id
) AS nb_race,
...
FROM runner INNER JOIN race ON (runner.race_id = race.id)
A marginal difference is that a subquery uses some internal aliases like U0 and U1.