djangodjango-modelsdjango-prefetch-related

How to optimize SQL in tree-like comment structure django?


I have posts with tree comments in it. Comments have likes as well. I've managed to optimize SQL queries that usual comments, related to the post itself, don't create new queries, but answers to the other comments still create 3-4 new requests to the data base. Can I somehow reduce number of queries? I've been using prefetch_related for this purpose.

My Post model:

class Post(models.Model):
    author = models.ForeignKey(
        User,
        on_delete=models.CASCADE
    )
    title = models.CharField(
        max_length=200,
        null=True,
        blank=True
    )
    header_image = models.ImageField(
        null=True,
        blank=True,
        upload_to="posts/headers",
        help_text='Post will start from this image'
    )
    body = CharField(
        max_length=500
    )

    post_date = models.DateTimeField(
        auto_now_add=True
    )
    likes = models.ManyToManyField(
        User,
        through='UserPostRel',
        related_name='likes',
        help_text='Likes connected to the post',
    )

    def total_likes(self):
        return self.likes.count()

Comment model

class Comment(models.Model):
    user = models.ForeignKey(User, related_name='comment_author', on_delete=models.CASCADE)
    post = models.ForeignKey(Post, related_name='comments', on_delete=models.CASCADE)
    body = models.TextField(max_length=255)
    comment_to_reply = models.ForeignKey("self", on_delete=models.CASCADE, null=True, blank=True, related_name='replies')
    likes = models.ManyToManyField(User, through='CommentLikeRelation', related_name='comment_likes')
    created_at = models.DateTimeField(auto_now_add=True)

    def replies_count(self):
        return self.replies.count()

    def total_likes(self):
        return self.likes.count()

    def is_a_leaf(self):
        return self.replies.exists()

    def is_a_reply(self):
        return self.comment_to_reply is not None

class CommentLikeRelation(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    comment = models.ForeignKey(Comment, on_delete=models.CASCADE)

View that process data

    def get(self, request, *args, **kwargs):
        current_user = request.user
        user_profile = User.objects.get(slug=kwargs.get('slug'))
        is_author = True if current_user == user_profile else False

        comment_form = CommentForm()
        Posts = Post.objects.filter(author=user_profile.id)
            .select_related('author')
            .prefetch_related('likes')
            .prefetch_related(
                'comments',
                'comments__user',
                'comments__likes',
                'comments__comment_to_reply',
                'comments__replies',
                'comments__replies__user',
                'comments__replies__likes',
            )
            .order_by('-post_date')
        )

        return render(request, self.template_name,
                      context={
                          'is_author': is_author,
                          'current_user': current_user,
                          'profile': user_profile,
                          'form': comment_form,
                          'posts': posts,
                      })

This is the duplicates and similarities I get


Solution

  • You can use the prefetch_related method to, well, prefetch the related comments and their replies, along with their authors and likes, in a single query. That can be one way to reduce your number of queries.

    How to use it is explained in the Django docs, and beyond that, it's really up to you. For example, you can modify Comment to include a related_name for replies (Django handles it for you), and then use prefetch_related for related comments and replies like this:

    class Comment(models.Model):
        # other fields
        post = models.ForeignKey(Post, related_name='comments', on_delete=models.CASCADE)
        comment_to_reply = models.ForeignKey("self", on_delete=models.CASCADE, null=True, blank=True, related_name='replies')
        # ...
    
    
    # you can do it inline too, if you want
    comment_prefetch = Prefetch('comments', queryset=Comment.objects.select_related('user').prefetch_related('likes', 'replies__user', 'replies__likes'))
    
    # and then you can query
    posts = Post.objects.filter(author=user_profile.id)\
                .select_related('author')\
                .prefetch_related('likes', comment_prefetch)\
    

    This way, you can get all related comments and their replies, along with their authors and likes in one query; Django will handle the rest with object caching, requiring minimal amount of additional queries.

    Or, to further optimize, you can use select_related to obtain related objects in a single query, instead of making additional queries per object; and in the end, annotate the amounts you need as computed values:

    comment_prefetch = Prefetch('comments', queryset=Comment.objects.select_related('user').prefetch_related('likes', 'replies__user', 'replies__likes').annotate(nb_likes=Count('likes'), nb_replies=Count('replies'))
    
    # Post.objects.filter is the same as above