pythonsqldjangogeneric-foreign-keydjango-generic-relations

How to use inverse of a GenericRelation


I must be really misunderstanding something with the GenericRelation field from Django's content types framework.

To create a minimal self contained example, I will use the polls example app from the tutorial. Add a generic foreign key field into the Choice model, and make a new Thing model:

class Choice(models.Model):
    ...
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    thing = GenericForeignKey('content_type', 'object_id')

class Thing(models.Model):
    choices = GenericRelation(Choice, related_query_name='things')

With a clean db, synced up tables, and create a few instances:

>>> poll = Poll.objects.create(question='the question', pk=123)
>>> thing = Thing.objects.create(pk=456)
>>> choice = Choice.objects.create(choice_text='the choice', pk=789, poll=poll, thing=thing)
>>> choice.thing.pk
456
>>> thing.choices.get().pk
789

So far so good - the relation works in both directions from an instance. But from a queryset, the reverse relation is very weird:

>>> Choice.objects.values_list('things', flat=1)
[456]
>>> Thing.objects.values_list('choices', flat=1)
[456]

Why the inverse relation gives me again the id from the thing? I expected instead the primary key of the choice, equivalent to the following result:

>>> Thing.objects.values_list('choices__pk', flat=1)
[789]

Those ORM queries generate SQL like this:

>>> print Thing.objects.values_list('choices__pk', flat=1).query
SELECT "polls_choice"."id" FROM "polls_thing" LEFT OUTER JOIN "polls_choice" ON ( "polls_thing"."id" = "polls_choice"."object_id" AND ("polls_choice"."content_type_id" = 10))
>>> print Thing.objects.values_list('choices', flat=1).query
SELECT "polls_choice"."object_id" FROM "polls_thing" LEFT OUTER JOIN "polls_choice" ON ( "polls_thing"."id" = "polls_choice"."object_id" AND ("polls_choice"."content_type_id" = 10))

The Django docs are generally excellent, but I can't understand why the second query or find any documentation of that behaviour - it seems to return data from the wrong table completely?


Solution

  • TL;DR This was a bug in Django 1.7 that was fixed in Django 1.8.

    The change went directly to master and did not go under a deprecation period, which isn't too surprising given that maintaining backwards compatibility here would have been really difficult. More surprising is that there was no mention of the issue in the 1.8 release notes, since the fix changes behavior of currently working code.

    The remainder of this answer is a description of how I found the commit using git bisect run. It's here for my own reference more than anything, so I can come back here if I ever need to bisect a large project again.


    First we setup a django clone and a test project to reproduce the issue. I used virtualenvwrapper here, but you can do the isolation however you wish.

    cd /tmp
    git clone https://github.com/django/django.git
    cd django
    git checkout tags/1.7
    mkvirtualenv djbisect
    export PYTHONPATH=/tmp/django  # get django clone into sys.path
    python ./django/bin/django-admin.py startproject djbisect
    export PYTHONPATH=$PYTHONPATH:/tmp/django/djbisect  # test project into sys.path
    export DJANGO_SETTINGS_MODULE=djbisect.mysettings
    

    create the following file:

    # /tmp/django/djbisect/djbisect/models.py
    from django.db import models
    from django.contrib.contenttypes.models import ContentType
    from django.contrib.contenttypes.fields import GenericForeignKey, GenericRelation
    
    class GFKmodel(models.Model):
        content_type = models.ForeignKey(ContentType)
        object_id = models.PositiveIntegerField()
        gfk = GenericForeignKey()
    
    class GRmodel(models.Model):
        related_gfk = GenericRelation(GFKmodel)
    

    also this one:

    # /tmp/django/djbisect/djbisect/mysettings.py
    from djbisect.settings import *
    INSTALLED_APPS += ('djbisect',)
    

    Now we have a working project, create the test_script.py to use with git bisect run:

    #!/usr/bin/env python
    import subprocess, os, sys
    
    db_fname = '/tmp/django/djbisect/db.sqlite3'
    if os.path.exists(db_fname):
        os.unlink(db_fname)
    
    cmd = 'python /tmp/django/djbisect/manage.py migrate --noinput'
    subprocess.check_call(cmd.split())
    
    import django
    django.setup()
    
    from django.contrib.contenttypes.models import ContentType
    from djbisect.models import GFKmodel, GRmodel
    
    ct = ContentType.objects.get_for_model(GRmodel)
    y = GRmodel.objects.create(pk=456)
    x = GFKmodel.objects.create(pk=789, content_type=ct, object_id=y.pk)
    
    query1 = GRmodel.objects.values_list('related_gfk', flat=1)
    query2 = GRmodel.objects.values_list('related_gfk__pk', flat=1)
    
    print(query1)
    print(query2)
    
    print(query1.query)
    print(query2.query)
    
    if query1[0] == 789 == query2[0]:
        print('FIXED')
        sys.exit(1)
    else:
        print('UNFIXED')
        sys.exit(0)
    

    The script must be executable, so add the flag with chmod +x test_script.py. It should be located in the directory that Django is cloned into, i.e. /tmp/django/test_script.py for me. This is because import django should pick up the locally checked-out django project first, not any version from site-packages.

    The user interface of git bisect was designed to find out where bugs appeared, so the usual prefixes of "bad" and "good" are backwards when you're trying to find out when a certain bug was fixed. This may seem somewhat upside-down, but the test script should exit with success (return code 0) if the bug is present, and it should fail out (with nonzero return code) if the bug is fixed. This tripped me up a few times!

    git bisect start --term-new=fixed --term-old=unfixed
    git bisect fixed tags/1.8
    git bisect unfixed tags/1.7
    git bisect run ./test_script.py
    

    So this process will do an automated search which eventually finds the commit where the bug was fixed. It takes some time, because there were a lot of commits between Django 1.7 and Django 1.8. It bisected 1362 revisions, roughly 10 steps, and eventually output:

    1c5cbf5e5d5b350f4df4aca6431d46c767d3785a is the first fixed commit
    commit 1c5cbf5e5d5b350f4df4aca6431d46c767d3785a
    Author: Anssi Kääriäinen <akaariai@gmail.com>
    Date:   Wed Dec 17 09:47:58 2014 +0200
    
        Fixed #24002 -- GenericRelation filtering targets related model's pk
    
        Previously Publisher.objects.filter(book=val) would target
        book.object_id if book is a GenericRelation. This is inconsistent to
        filtering over reverse foreign key relations, where the target is the
        related model's primary key.
    

    That's precisely the commit where the query has changed from the incorrect SQL (which gets data from the wrong table)

    SELECT "djbisect_gfkmodel"."object_id" FROM "djbisect_grmodel" LEFT OUTER JOIN "djbisect_gfkmodel" ON ( "djbisect_grmodel"."id" = "djbisect_gfkmodel"."object_id" AND ("djbisect_gfkmodel"."content_type_id" = 8) )
    

    into the correct version:

    SELECT "djbisect_gfkmodel"."id" FROM "djbisect_grmodel" LEFT OUTER JOIN "djbisect_gfkmodel" ON ( "djbisect_grmodel"."id" = "djbisect_gfkmodel"."object_id" AND ("djbisect_gfkmodel"."content_type_id" = 8) )
    

    Of course, from the commit hash we're able to find the pull request and the ticket easily on github. Hopefully this may help someone else one day too - bisecting Django can be tricky to setup due to the migrations!