python-3.xalembicsqlalchemy-migrate

Update column content during Alembic migration


Suppose my db model contains an object User:

Base = declarative_base() 

class User(Base):                                                               
    __tablename__ = 'users'                                                     

    id = Column(String(32), primary_key=True, default=...) 
    name = Column(Unicode(100))                                             

and my database contains a users table with n rows. At some point I decide to split the name into firstname and lastname, and during alembic upgrade head I would like my data to be migrated as well.

The auto-generated Alembic migration is as follows:

def upgrade():
    op.add_column('users', sa.Column('lastname', sa.Unicode(length=50), nullable=True))
    op.add_column('users', sa.Column('firstname', sa.Unicode(length=50), nullable=True))

    # Assuming that the two new columns have been committed and exist at
    # this point, I would like to iterate over all rows of the name column,
    # split the string, write it into the new firstname and lastname rows,
    # and once that has completed, continue to delete the name column.

    op.drop_column('users', 'name')                                             

def downgrade():
    op.add_column('users', sa.Column('name', sa.Unicode(length=100), nullable=True))

    # Do the reverse of the above.

    op.drop_column('users', 'firstname')                                        
    op.drop_column('users', 'lastname')

There seem to be multiple and more or less hacky solutions to this problem. This one and this one both propose to use execute() and bulk_insert() to execute raw SQL statements during a migration. This (incomplete) solution imports the current db model but that approach is fragile when that model changes.

How do I migrate and modify the existing content of column data during an Alembic migration? What is the recommended way, and where is it documented?


Solution

  • The proposed solution in norbertpy’s answer sounds good at first, but I think it has one fundamental flaw: it would introduce multiple transactions—in between the steps, the DB would be in a funky, inconsistent state. It also seems odd to me (see my comment) that a tool would migrate a DB’s schema without the DB’s data; the two are too closely tied together to separate them.

    After some poking around and several conversations (see code snippets in this Gist) I’ve decided for the following solution:

    def upgrade():
    
        # Schema migration: add all the new columns.
        op.add_column('users', sa.Column('lastname', sa.Unicode(length=50), nullable=True))
        op.add_column('users', sa.Column('firstname', sa.Unicode(length=50), nullable=True))
    
        # Data migration: takes a few steps...
        # Declare ORM table views. Note that the view contains old and new columns!        
        t_users = sa.Table(
            'users',
            sa.MetaData(),
            sa.Column('id', sa.String(32)),
            sa.Column('name', sa.Unicode(length=100)), # Old column.
            sa.Column('lastname', sa.Unicode(length=50)), # Two new columns.
            sa.Column('firstname', sa.Unicode(length=50)),
            )
        # Use Alchemy's connection and transaction to noodle over the data.
        connection = op.get_bind()
        # Select all existing names that need migrating.
        results = connection.execute(sa.select([
            t_users.c.id,
            t_users.c.name,
            ])).fetchall()
        # Iterate over all selected data tuples.
        for id_, name in results:
            # Split the existing name into first and last.
            firstname, lastname = name.rsplit(' ', 1)
            # Update the new columns.
            connection.execute(t_users.update().where(t_users.c.id == id_).values(
                lastname=lastname,
                firstname=firstname,
                ))
    
        # Schema migration: drop the old column.
        op.drop_column('users', 'name')                                             
    

    Two comments about this solution:

    1. As noted in the referenced Gist, newer versions of Alembic have a slightly different notation.
    2. Depending on the DB driver, the code may behave differently. Apparently, MySQL does not handle the above code as a single transaction (see “Statements That Cause an Implicit Commit”). So you have to check with your DB implementation.

    The downgrade() function can be implemented similarly.

    Addendum. See the Conditional Migration Elements section in the Alembic Cookbook for examples of pairing schema migration with data migration.