djangodjango-modelsdjango-rest-frameworkdjango-postgresqldjango-aggregation

Django ORM querying nested many to many table efficiently?


SO lets say I am designing a db for cooking session models as below

from django.db import models

class Recipe(models.Model):
    name = models.CharField(max_length=255)
   
    def __str__(self):
        return self.name

class Step(models.Model):
    name = models.CharField(max_length=255)
    recipes = models.ManyToManyField(Recipe, related_name='steps')

    def __str__(self):
        return self.name

class CookingSession(models.Model):
    name = models.CharField(max_length=255)
    steps = models.ManyToManyField(Step, related_name='cooking_sessions')

    def __str__(self):
        return self.name

How can I use minimal number of queries (preferably one) to get all steps for a certain cooking session where each step should have the corresponding recipes if any.

cooking_sessions = (
            CookingSession.objects.annotate(
             
                step_list=ArrayAgg(
                    models.F(
                        "steps__name",
                    ),
                    distinct=True,
                ),
                recipe_list=ArrayAgg(models.F("steps__recipes__name")),
                
            )
           
        )

This is how the data looks like

[
    {
        'id': 1,
        'name': 'Italian Night',
        'step_list': ['Preparation', 'Cooking', 'Serving'],
        'recipe_list': ['Tomato Sauce', 'Pasta Dough', 'Spaghetti', 'Tomato Sauce', 'Garlic Bread']
    },
    ...
]

I would like the data to be like

{
    'id': 1,
    'name': 'Italian Night',
    'steps': [
        {
            'step_name': 'Preparation',
            'recipes': ['Tomato Sauce', 'Pasta Dough']
        },
        {
            'step_name': 'Cooking',
            'recipes': ['Spaghetti', 'Tomato Sauce']
        },
        {
            'step_name': 'Serving',
            'recipes': ['Garlic Bread']
        }
    ]
}


Solution

  • You can transform the result with the ArrayAggs [Django-doc] to:

    from itertools import groupby
    from operator import itemgetter
    
    from django.contrib.postgres.aggregates import ArrayAgg
    from django.db.models import F
    
    cooking_sessions = CookingSession.objects.annotate(
        step_list=ArrayAgg('steps__name'),
        recipe_list=ArrayAgg('steps__recipes__name'),
    )
    
    for cooking_session in cooking_sessions:
        cooking_session.steps = [
            {'step_name': name, 'recipes': [r for __, r in items]}
            for name, items in groupby(
                zip(cooking_session.step_list, cooking_session.recipe_list),
                itemgetter(0),
            )
        ]

    But it is quite complicated, and prone to errors. For example we here assume PostgreSQL will return the steps__name and steps__recipes__name in the same order, which might eventually change.

    I would advise to just prefetch the items with .prefetch_related(…) [Django-doc], which will do this in two extra queries, but not per CookingSession. So regardless of the number of CookingSessions, Steps and Recipes, we fetch in three queries with:

    CookingSession.objects.prefetch_related('steps__recipes')