sqldjangoorm

Django: how to effectively use select_related() with Paginator?


I have 2 related models with 10 Million rows each and want to perform an efficient paginated request of 50 000 items of one of them and access related data on the other one:

class RnaPrecomputed(models.Model):
    id = models.CharField(max_length=22, primary_key=True)
    rna = models.ForeignKey('Rna', db_column='upi', to_field='upi', related_name='precomputed')
    description = models.CharField(max_length=250)


class Rna(models.Model):
    id = models.IntegerField(db_column='id')
    upi = models.CharField(max_length=13, db_index=True, primary_key=True)
    timestamp = models.DateField()
    userstamp = models.CharField(max_length=30)

As you can see, RnaPrecomputed is related to RNA via a foreign key. Now, I want to fetch a specific page of 50 000 items of RnaPrecomputed and corresponding Rnas related to them. I expect N+1 requests problem, if I do this without select_related() call. Here are the timings:


First, for reference I won't touch the related model at all:

rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all(), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
    message = message + str(object.id)

Takes:

real    0m12.614s
user    0m1.073s
sys 0m0.188s

Now, I'll try accessing data on related model:

rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all(), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
    message = message + str(object.rna.upi)

it takes:

real    2m27.655s
user    1m20.194s
sys 0m4.315s

Which is a lot, so, probably I have N+1 requests problem.


But now, if I use select_related(),

rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all().select_related('rna'), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
    message = message + str(object.rna.upi)

it takes even more:

real    7m9.720s
user    0m1.948s
sys 0m0.337s

So, somehow select_related() made things 3 times slower, instead of making them faster. And probably without it, I have N+1 requests, so for each entry of RnaPrecomputed, Django ORM probably has to do an additional request to the database to fetch the corresponding Rna?

What am I doing wrong and how to make select_related() perform well with paginated queryset?


Solution

  • It's worth checking that you're not missing an index in your database. You have db_index=True for the Rna.upi field, but are you sure the index exists in the database?

    If the select_related is making the count() query slow, then you could try doing the select_related on the paginated object_list.

    for object in rna_paginator.page(300).object_list.select_related():
        message = message + str(object.rna.upi)