pythondjangodjango-orm

Is it possible to switch to a through model in one release?


Assume I have those Django models:

class Book(models.Model):
    title = models.CharField(max_length=100)

class Author(models.Model):
    name = models.CharField(max_length=100)
    books = models.ManyToManyField(Book)

I already have a production system with several objects and several Author <-> Book connections.

Now I want to switch to:

class Book(models.Model):
    title = models.CharField(max_length=100)

class BookAuthor(models.Model):
    book = models.ForeignKey(Book, on_delete=models.CASCADE)
    author = models.ForeignKey("Author", on_delete=models.CASCADE)
    impact = models.IntegerField(default=1)

    class Meta:
        unique_together = ("book", "author")

class Author(models.Model):
    name = models.CharField(max_length=100)
    books = models.ManyToManyField(Book, through=BookAuthor)

If I do this Migration:

from django.db import migrations

def migrate_author_books(apps, schema_editor):
    Author = apps.get_model('yourappname', 'Author')
    BookAuthor = apps.get_model('yourappname', 'BookAuthor')

    for author in Author.objects.all():
        for book in author.books.all():
            # Create a BookAuthor entry with default impact=1
            BookAuthor.objects.create(author=author, book=book, impact=1)

class Migration(migrations.Migration):

    dependencies = [
        ('yourappname', 'previous_migration_file'),
    ]

    operations = [
        migrations.CreateModel(name="BookAuthor", ...),
        migrations.RunPython(migrate_author_books),
        migrations.RemoveField(model_name="author", name="books"),
        migrations.AddField(model_name="author", name="books", field=models.ManyToManyField(...),
    ]

then the loop for book in author.books.all() will access the new (and empty) BookAuthor table instead of iterating over the existing default table Django created.

How can I make the data migration?

The only way I see is to have two releases:

  1. Just add the new BookAuthor model and fill it with data, but keep the existing one. So introducing a new field and keeping the old one. Also change every single place where author.books is used to author.books_new
  2. Release + migrate on prod
  3. Remove author.books and rename books_new to books. Another migration, another release.

Isn't there a simpler way?


Solution

  • You don't actually need a data migration at all to add a through table to a many-to-many relation.

    When you create a many-to-many relation without a through, Django creates a virtual model for it you behind the scenes:

    >>> from xxxx.models import Author
    >>> Author.books.through
    <class 'xxxx.models.Author_books'>
    >>> Author.books.through._meta.db_table
    'xxxx_author_books'
    >>> Author.books.through._meta.get_fields()
    (<django.db.models.fields.BigAutoField: id>, <django.db.models.fields.related.ForeignKey: author>, <django.db.models.fields.related.ForeignKey: book>)
    >>> Author.books.through._meta.unique_together
    (('author', 'book'),)
    

    Armed with this knowledge, you can create the same table as a real model (nb: I didn't check the exact fields from the virtual through table – you might want more diligence here!).

    The important bit, however, is that you will need to set db_table manually to what the virtual through table's name is.

    class BookAuthor(models.Model):
        book = models.ForeignKey(Book, on_delete=models.CASCADE)
        author = models.ForeignKey("Author", on_delete=models.CASCADE)
    
        class Meta:
            unique_together = ("book", "author")
            db_table = "xxxx_author_books"
    

    You'll also need to set the through on the ManyToManyField at this point.

    If you create a migration out of this, you will get a CreateModel, but trying to run that migration will understandably fail – you already have a "xxxx_author_books".

    You'll need to modify the migration to wrap the operations in this migration in a SeparateDatabaseAndState, so the physical database is not touched – after all, it doesn't need to be touched, since all we did was write out the same table that already was implicitly created:

    migrations.SeparateDatabaseAndState(
        database_operations=[],
        state_operations=[
           migrations.CreateModel(...),
           migrations.AlterField(...), 
           ...
    

    Migrating this should go through without a hitch (and without touching the database).

    You now have a bona fide through table you add the impact field on, and migrate as usual.


    EDIT: I just noted this operation has been described in the manual, too.