pythonvalidationsqlalchemymaxlength

How to validate automatically String/Unicode columns maximum length when specified on declaration?


SQLAlchemy allows to specify a length when declaring String columns:

foo = Column(String(10))

as in SQL:

foo VARCHAR(10)

I know this length value is used by some DBMS to allocate memory when creating a row in a table. But some DBMS (like SQLite) do not take care of it and accept this syntax only for compatibility with SQL standard. But some DBMS (like MySQL) require it to be specified.

Personally, I like to specify a maximum length for some text data because it helps to design UIs as you know the area needed to display it.

Moreover, I think it will make my application behavior more consistent across the different DBMS.

So, I want to validate the value of String/Unicode columns on insertion by checking its length against the declared one (when length as been declared).

CheckConstraint

The first solution is to use a check constraint:

from sqlalchemy import CheckConstraint, Column, Integer, String, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine("sqlite:///:memory:", echo=True)
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)


class Foo(Base):
    __tablename__ = "Foo"

    id = Column(Integer, primary_key=True)
    bar = Column(String(10), CheckConstraint("LENGTH(bar) < 10"))


Base.metadata.create_all()

if __name__ == "__main__":
    session = Session()
    session.add(Foo(bar="a" * 20))

    try:
        session.commit()
    except IntegrityError as e:
        print(f"Failed with: {e.orig}")

It works but the SQL constraint expression is not generated by SQLAlchemy. So it may require some custom generation if a DBMS needs a different syntax.

Validator

I also tried to use an SQLAlchemy validator:

class Foo(Base):
    __tablename__ = "Foo"

    id = Column(Integer, primary_key=True)
    bar = Column(String(10))

    @validates("bar")
    def check_bar_length(self, key, value):
        column_type = getattr(type(self), key).expression.type
        max_length = column_type.length

        if len(value) > max_length:
            raise ValueError(
                f"Value '{value}' for column '{key}' "
                f"exceed maximum length of '{max_length}'"
            )

        return value
try:
    Foo(bar="a" * 20)
except ValueError as e:
    print(f"Failed with: {e}")

Now, the maximum length is inferred from the declared one.

The check is done on entity creation, not on commit. I don't know if it can be a problem.

Custom type

Both solutions shown above need to apply the validation on each column. I'm looking for a solution to perform automatically the check on String/Unicode columns having a declared length.

Using a custom type might be the solution. But it looks like a ugly hack as custom types are not made for data validation but for data transformation.

So, do you think about another solution, maybe an SQLAlchemy feature I don't know about, which will help me to add the check automatically to all String columns where a length is specified ?


Solution

  • I found a solution which seems to fit my needs.But I think the way I add the constraint is a bit hacky.

    It involves the usage of:

    Entity declaration

    The entity is declared as usual without the need to specify any constraint:

    from sqlalchemy import Column, Integer, LargeBinary, String, Unicode, 
    
    class Foo(Entity):
        __tablename__ = "Foo"
    
        id = Column(Integer, primary_key=True)
        string_without_length = Column(String())
        string_with_length = Column(String(10))
        unicode_with_length = Column(Unicode(20))
        binary = Column(LargeBinary(256))
    

    Constraint attaching

    The constraint is attached to the columns before the class is instrumented:

    from sqlalchemy import CheckConstraint, func, String
    from sqlalchemy.event import listen_for
    from sqlalchemy.orm import mapper
    
    @listens_for(mapper, "instrument_class")
    def add_string_length_constraint(mapper, cls):
        table = cls.__table__
    
        for column in table.columns:
            if isinstance(column.type, String):
                length = column.type.length
    
                if length is not None:
                    CheckConstraint(
                        func.length(column) <= length,
                        table=column,
                        _autoattach=False,
                    )
    

    Resulting DDL statement (SQLite)

    CREATE TABLE "Foo" (
        id INTEGER NOT NULL, 
        string_without_length VARCHAR, 
        string_with_length VARCHAR(10) CHECK (length(string_with_length) <= 10), 
        unicode_with_length VARCHAR(20) CHECK (length(unicode_with_length) <= 20), 
        binary BLOB, 
        PRIMARY KEY (id)
    )
    

    Implementation Details

    @listens_for(mapper, "instrument_class")
    

    The instrument_class event occurs when the mapper for an instrumented class is created but not fully initialized. It can be listened on your base declarative class (created with declarative_base()) or directly on the slqalchemy.orm.mapper class.

    if isinstance(column.type, String):
    

    Only String (and subclasses like Unicode) columns...

    if length is not None:
    

    ...whose length is set are considered.

    CheckConstraint(
        func.length(column) <= length,
        table=column,
        _autoattach=False,
    )
    

    The constraint is generated using SQLAlchemy expressions.

    Finally, the hacky part:

    When creating the constraint, SQLAlchemy automatically attach it to the table (I think it detects the column concerned by the constraint).

    As I want it to be generated as part of the column definition, I disable this auto-attachement using _autoattach=False, then I specify the column using table=column.

    If you don't care about it, just ignore those arguments:

    CheckConstraint(func.length(column) <= length)
    

    The resulting DDL statement will be:

    CREATE TABLE "Foo" (
        id INTEGER NOT NULL, 
        string_without_length VARCHAR, 
        string_with_length VARCHAR(10), 
        unicode_with_length VARCHAR(20), 
        binary BLOB, 
        PRIMARY KEY (id), 
        CHECK (length(string_with_length) <= 10), 
        CHECK (length(unicode_with_length) <= 20)
    )