pythonsqlalchemymarshmallowmarshmallow-sqlalchemy

Issue with SQLAlchemy - Marshmallow nested object inferring foreign key


I'm trying to get Marshmallow-SQLAlchemy to deserialize an object with a nested object without specifying the foreign key for the nested object (which should be the primary key of the parent object). Here's a standalone example:

# Python version == 3.8.2
from datetime import datetime
import re

# SQLAlchemy == 1.3.23
from sqlalchemy import func, create_engine, Column, ForeignKey, Text, DateTime
from sqlalchemy.ext.declarative import as_declarative, declared_attr
from sqlalchemy.orm import relationship, sessionmaker

# marshmallow==3.10.0
# marshmallow-sqlalchemy==0.24.2
from marshmallow import fields
from marshmallow.fields import Nested
from marshmallow_sqlalchemy import SQLAlchemyAutoSchema

################################################################################
# Set up
################################################################################

engine = create_engine("sqlite:///test.db")

Session = sessionmaker()
Session.configure(bind=engine)
session = Session()


################################################################################
# Models
################################################################################

@as_declarative()
class Base(object):

    @declared_attr
    def __tablename__(cls):
        # From https://stackoverflow.com/questions/1175208/elegant-python-function-to-convert-camelcase-to-snake-case
        name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', cls.__name__)
        return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()

    @declared_attr
    def updated(cls):
        return Column(DateTime, default=func.now(), onupdate=func.now(), nullable=False)


class Account(Base):
    id = Column(Text, primary_key=True)
    name = Column(Text, nullable=False)
    tags = relationship("AccountTag", backref="account")


class AccountTag(Base):
    account_id = Column(Text, ForeignKey('account.id'), primary_key=True)
    Key = Column(Text, primary_key=True)
    Value = Column(Text, nullable=False)


################################################################################
# Schemas
################################################################################

class AutoSchemaWithUpdate(SQLAlchemyAutoSchema):
    class Meta:
        load_instance = True
        sqla_session = session
    updated = fields.DateTime(default=lambda: datetime.now())


class AccountSchema(AutoSchemaWithUpdate):
    class Meta:
        model = Account
        include_relationships = True

    tags = Nested("AccountTagSchema", many=True)


class AccountTagSchema(AutoSchemaWithUpdate):
    class Meta:
        model = AccountTag
        include_fk = True


################################################################################
# Test
################################################################################

Base.metadata.create_all(engine)

account_object = AccountSchema().load({
        "id": "ABC1234567",
        "name": "Account Name",
        "tags": [
            {
                "Value": "Color",
                "Key": "Blue"
            }
        ]
    })

session.merge(account_object)

session.commit()

And here's the error I'm getting:

Traceback (most recent call last):
  File "example.py", line 88, in <module>
    account_object = AccountSchema().load({
  File "C:\python\site-packages\marshmallow_sqlalchemy\schema\load_instance_mixin.py", line 92, in load
    return super().load(data, **kwargs)
  File "C:\python\site-packages\marshmallow\schema.py", line 727, in load
    return self._do_load(
  File "C:\python\site-packages\marshmallow\schema.py", line 909, in _do_load
    raise exc
marshmallow.exceptions.ValidationError: {'tags': {0: {'account_id': ['Missing data for required field.']}}}

I feel like I'm trying to do something intuitive but I'm not sure anymore. I'm sure that I'm close here but am having no luck getting this to work. Help is much appreciated.


Solution

  • You are getting the error because you've specified include_fk in the Meta class for AccountTagSchema.

    You can inspect the fields that have been generated for the schema:

    print(AccountTagSchema._declared_fields["account_id"])
    # <fields.String(default=<marshmallow.missing>, attribute=None, validate=[], required=True, load_only=False, dump_only=False, missing=<marshmallow.missing>, allow_none=False, error_messages={'required': 'Missing data for required field.', 'null': 'Field may not be null.', 'validator_failed': 'Invalid value.', 'invalid': 'Not a valid string.', 'invalid_utf8': 'Not a valid utf-8 string.'})>
    

    Notice that it generates account_id with required=True, this is due to the sqlalchemy column that it represents being NOT NULL as it is part of the primary key.

    So the simplest thing is to remove include_fk from the schema meta:

    class AccountTagSchema(AutoSchemaWithUpdate):
        class Meta(AutoSchemaWithUpdate.Meta):
            model = AccountTag
            #  include_fk = True  <--- remove
    

    ...however, run the script and you'll run into another problem:

    sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.dict' is not mapped

    This means that we end up passing a dict to SQLAlchemy session where it is expecting a mapped Base subclass.

    The reason for this is that when a child class inherits from a base schema, e.g., AutoSchemaWithUpdate in this case, the child class does not automatically inherit the parent's meta configuration. The docs provide a couple of strategies for this, the simplest being that the child's Meta class should also inherit from the parent's Meta class:

    class AccountSchema(AutoSchemaWithUpdate):
        class Meta(AutoSchemaWithUpdate.Meta):  # <--- this here
            model = Account
            include_relationships = True
    
        tags = Nested("AccountTagSchema", many=True)
    

    Once we do that for both AccountSchema and AccountTagSchema we are ready to run the script again and it works...the first time. Immediately run the script again, and another error occurs:

    AssertionError: Dependency rule tried to blank-out primary key column 'account_tag.account_id' on instance '<AccountTag at 0x7f14b0f9b670>'

    This is a consequence of the design decision to have the loaded AccountTag instances unidentifiable (i.e., excluding the primary key from the payload) and the decision to include the foreign key field as part of the primary key for AccountTag.

    SQLAlchemy cannot identify that the newly created AccountTag instances are the same as the ones that already exist, so it first tries to disassociate the original account tags from the account by setting the value of the foreign key field to None. However, this isn't allowed as the foreign key is also the primary key and cannot be set NULL.

    The solution for this is described here and involves setting an explicit cascade on the relationship:

    class Account(Base):
        id = Column(Text, primary_key=True)
        name = Column(Text, nullable=False)
        tags = relationship("AccountTag", backref="account", cascade="all,delete-orphan")
    

    Now run the script again, and it will work every time.