pythondjangoperformancedjango-rest-frameworkdjango-debug-toolbar

Django rest framework simple model serializer list view takes forever to show 10 records


I have this model -

class News(BaseEntityBasicAbstract, HitCountMixin):
    """
    News added from the dashboard with content
    """
    NEWS_STATUS = (
        ('draft', _('Draft')),
        ('pending', _('Pending')),
        ('review', _('Review')),
        ('public', _('Public')),
        ('private', _('Private'))
    )
    backup = models.BooleanField(default=False)
    prev_id = models.BigIntegerField(null=True, blank=True)
    language = models.CharField(max_length=10, choices=LANGUAGES, default='bn', db_index=True)
    heading = models.CharField(max_length=255, null=True, blank=True,
                               verbose_name=_('News Heading'),
                               help_text=_('Provide a news heading/caption.'))
    sub_caption = models.TextField(max_length=255, null=True, blank=True,
                                   verbose_name=_('Summary'),
                                   help_text=_('Provide summary of the news.'))
    url = models.CharField(max_length=255, unique=True, verbose_name=_('URL/Slug/Link'),
                           help_text=_('Unique url for the news without whitspace.'))
    content = HTMLField(null=True, blank=True, verbose_name=_('Content'),
                        help_text=_('HTML content with texts, links & images.'))
    featured_image = models.FileField(upload_to=FilePrefix('news/'), null=True, blank=True,
                                      verbose_name=_('Featured Image'),
                                      help_text=_('Upload a featured image for news.'))
    image_caption = models.TextField(max_length=255, null=True, blank=True,
                                     verbose_name=_('Image Caption'),
                                     help_text=_('Provide a image caption.'))
    status = models.CharField(max_length=20, choices=NEWS_STATUS, default='pending',
                              verbose_name=_('News Status'), db_index=True,
                              help_text=_('Only public news can be seen on front end.'))
    source = models.ForeignKey(NewsSource, on_delete=models.SET_NULL, null=True, blank=True,
                               verbose_name=_('News Source'),
                               help_text=_('Select a news source.'))
    category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True, blank=True,
                                 verbose_name=_('Category'),
                                 help_text=_('Select a news category.'))
    tags = tagulous.models.TagField(
        blank=True,
        to=Tags,
        verbose_name=_('News Tags'),
        help_text=_('Provide news tags separated with commas.')
    )
    published_at = models.DateTimeField(null=True, blank=True, db_index=True,
                                        verbose_name=_('Published At'))
    menu_items = GenericRelation(MenuItems, object_id_field='id',
                                 related_query_name='news_as_menu')
    hit_count_generic = GenericRelation(HitCount, object_id_field='object_pk',
                                        related_query_name='news_hit_count')
    created_by = models.ForeignKey(User, related_name='news_created_by',
                                   on_delete=models.SET_NULL, null=True, blank=True,
                                   verbose_name=_('Created By'))
    updated_by = models.ForeignKey(User, related_name='news_updated_by',
                                   on_delete=models.SET_NULL, null=True, blank=True,
                                   verbose_name=_('Last Updated By'))
    published_by = models.ForeignKey(User, related_name='news_published_by',
                                     on_delete=models.SET_NULL, null=True, blank=True,
                                     verbose_name=_('Published By'))
    deleted_by = models.ForeignKey(User, related_name='news_deleted_by',
                                   on_delete=models.SET_NULL, null=True, blank=True,
                                   verbose_name=_('Deleted By'))

Below is the serializer -

class NewsSerializer(serializers.ModelSerializer):
    class Meta:
        model = News
        fields = ['id', 'heading', 'sub_caption', 'url', 'content', 'featured_image',
                  'image_caption', 'category', 'source', 'tags', 'published_at']

This is the view -

class NewsViewSets(viewsets.ModelViewSet):
    queryset = News.objects.filter(
        is_active=True,
        status='public'
    )
    serializer_class = NewsSerializer

    def get_queryset(self):
        queryset = self.queryset.filter(
            language=self.request.LANGUAGE_CODE
        ).order_by('-id')

        return queryset

Pagination is set to only 10 in the settings.py, when I hit the news api url it takes 9/10 seconds to load only 10 records. Here's a screenshot showing django-debug-toolbar reports -

djdt image for drf view

I have around 400k records on the database table, it may be an issue, but I think this is too much loading time. Please help me find the problem here! Thanks in advance.


Solution

  • Filtering can often be slow. It looks like you have database indexes on the relevant fields, but take note that if you have multiple filters, only one of the indexes will be used.

    I'm guessing based on your columns but it seems like the most common query will always be looking for is_active=1 and status='public'. If this isn't the case you might have to make some tweaks.

    Firstly, get rid of the db_index=True on the status, is_active, and language fields, otherwise your database writes will be slowed unnecessarily.

    Then you can formulate an index such as this:

    class Meta:
        indexes = [
            models.Index(
                fields=["is_active", "status", "language"],
                name="idx_filtering",
            )
        ]
    

    This will help the database when you filter on all three columns at once. If you're ever filtering on only one of these columns however, you may want to keep the original db_index=True.

    If you were using PostgreSQL, you could do one better:

    class Meta:
        indexes = [
            models.Index(
                fields=["is_active", "status", "language"],
                name="idx_filtering",
                condition=Q(is_active=True, status="public"),
            )
    

    This would reduce the size of the index to only those matching the Q(), making traversing it faster.

    One other thing to note is that pagination using OFFSET is very slow once you get to higher offsets. If at all possible you should be using DRF's cursor pagination instead.