pythondjangodjango-modelsdjango-select-relateddjango-prefetch-related

What's the difference between select_related and prefetch_related in Django ORM?


In Django doc:

select_related() "follows" foreign-key relationships, selecting additional related-object data when it executes its query.

prefetch_related() does a separate lookup for each relationship, and does the "joining" in Python.

What does it mean by "doing the joining in python"? Can someone illustrate with an example?

My understanding is that for foreign key relationship, use select_related; and for M2M relationship, use prefetch_related. Is this correct?


Solution

  • Your understanding is mostly correct:

    Just to clarify what I mean by reverse ForeignKeys, here's an example:

    class ModelA(models.Model):
        pass
    
    class ModelB(models.Model):
        a = ForeignKey(ModelA)
    
    # Forward ForeignKey relationship
    ModelB.objects.select_related('a').all()
    
    # Reverse ForeignKey relationship
    ModelA.objects.prefetch_related('modelb_set').all() 
    

    The difference is that:

    You may use prefetch_related for anything that you can use select_related for.

    The tradeoffs are that prefetch_related has to create and send a list of IDs to select back to the server, this can take a while. I'm not sure if there's a nice way of doing this in a transaction, but my understanding is that Django always just sends a list and says SELECT ... WHERE pk IN (...,...,...) basically. In this case if the prefetched data is sparse (let's say U.S. State objects linked to people's addresses) this can be very good, however if it's closer to one-to-one, this can waste a lot of communications. If in doubt, try both and see which performs better.

    Everything discussed above is basically about the communications with the database. On the Python side however prefetch_related has the extra benefit that a single object is used to represent each object in the database. With select_related duplicate objects will be created in Python for each "parent" object. Since objects in Python have a decent bit of memory overhead this can also be a consideration.