pythonsqlalchemyflask-sqlalchemy

SQLAlchemy TypeDecorators and Comparison Errors


Using SQLAlchemy I have defined my own TypeDecorator for storing pandas DataFrames in a databased encoded as JSON string.

class db_JsonEncodedDataFrameWithTimezone(db.TypeDecorator):
    impl = db.Text

    def process_bind_param(self, value, dialect):
        if value is not None and isinstance(value, pd.DataFrame):
            timezone = value.index.tz.zone
            df_json = value.to_json(orient="index")
            data = {'timezone': timezone, 'df': df_json, 'index_name': value.index.name}
            value = json.dumps(data)
        return value

    def process_result_value(self, value, dialect):
        if value is not None:
            data = json.loads(value)
            df = pd.read_json(data['df'], orient="index")
            df.index = df.index.tz_localize('UTC')
            df.index = df.index.tz_convert(data['timezone'])
            df.index.name = data['index_name']
            value = df
        return value

This works fine for first time database save, and loading is fine too.

The problem comes when I augment the value, i.e. change the DataFrame and try to alter the database. When I invoke

db.session.add(entity)
db.session.commit()

I get a traceback which points to comparing values being the problem:

x == y
ValueError: Can only compare identically-labeled DataFrame Objects.

So I suspect my problem has something to do with coercing comparators. I have tried three things, all have failed and I really don't know what to do next:

#1st failed solution attempt inserting
coerce_to_is_types = (pd.DataFrame,)

#2nd failed solution attempt inserting
def coerce_compared_value(self, op, value):
    return self.impl.coerce_compared_value(op, value)

#3rd failed solution attempt
class comparator_factory(db.Text.comparator_factory):
    def __eq__(self, other):
         try:
             value = (self == other).all().all()
         except ValueError:
             value = False
         return value

Solution

  • On my fourth attempt I think I found the answer, I directly create my own compare function that I inserted in the Type class above. This avoids the operator 'x == y' being performed on my DataFrames:

    def compare_values(self, x, y):
        from pandas.util.testing import assert_frame_equal
        try:
            assert_frame_equal(x, y, check_names=True, check_like=True)
            return True
        except (AssertionError, ValueError, TypeError):
            return False
    

    Another problem of this nature later appeared in my code. The solution was to amend the above to attempt the natural compare first and if that failed then implement the above:

    try:
        value = x == y
    except:
        # some other overwriting comparision method such as above