I have 2 related models with 10 Million rows each and want to perform an efficient paginated request of 50 000 items of one of them and access related data on the other one:
class RnaPrecomputed(models.Model):
id = models.CharField(max_length=22, primary_key=True)
rna = models.ForeignKey('Rna', db_column='upi', to_field='upi', related_name='precomputed')
description = models.CharField(max_length=250)
class Rna(models.Model):
id = models.IntegerField(db_column='id')
upi = models.CharField(max_length=13, db_index=True, primary_key=True)
timestamp = models.DateField()
userstamp = models.CharField(max_length=30)
As you can see, RnaPrecomputed
is related to RNA
via a foreign key. Now, I want to fetch a specific page of 50 000 items of RnaPrecomputed
and corresponding Rna
s related to them. I expect N+1 requests problem, if I do this without select_related()
call. Here are the timings:
First, for reference I won't touch the related model at all:
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all(), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.id)
Takes:
real 0m12.614s
user 0m1.073s
sys 0m0.188s
Now, I'll try accessing data on related model:
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all(), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.rna.upi)
it takes:
real 2m27.655s
user 1m20.194s
sys 0m4.315s
Which is a lot, so, probably I have N+1 requests problem.
But now, if I use select_related()
,
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all().select_related('rna'), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.rna.upi)
it takes even more:
real 7m9.720s
user 0m1.948s
sys 0m0.337s
So, somehow select_related()
made things 3 times slower, instead of making them faster. And probably without it, I have N+1 requests, so for each entry of RnaPrecomputed
, Django ORM probably has to do an additional request to the database to fetch the corresponding Rna
?
What am I doing wrong and how to make select_related()
perform well with paginated queryset?
It's worth checking that you're not missing an index in your database. You have db_index=True
for the Rna.upi
field, but are you sure the index exists in the database?
If the select_related
is making the count()
query slow, then you could try doing the select_related
on the paginated object_list
.
for object in rna_paginator.page(300).object_list.select_related():
message = message + str(object.rna.upi)