Using SQLAlchemy I have defined my own TypeDecorator for storing pandas DataFrames in a databased encoded as JSON string.
class db_JsonEncodedDataFrameWithTimezone(db.TypeDecorator):
impl = db.Text
def process_bind_param(self, value, dialect):
if value is not None and isinstance(value, pd.DataFrame):
timezone = value.index.tz.zone
df_json = value.to_json(orient="index")
data = {'timezone': timezone, 'df': df_json, 'index_name': value.index.name}
value = json.dumps(data)
return value
def process_result_value(self, value, dialect):
if value is not None:
data = json.loads(value)
df = pd.read_json(data['df'], orient="index")
df.index = df.index.tz_localize('UTC')
df.index = df.index.tz_convert(data['timezone'])
df.index.name = data['index_name']
value = df
return value
This works fine for first time database save, and loading is fine too.
The problem comes when I augment the value, i.e. change the DataFrame and try to alter the database. When I invoke
db.session.add(entity)
db.session.commit()
I get a traceback which points to comparing values being the problem:
x == y
ValueError: Can only compare identically-labeled DataFrame Objects.
So I suspect my problem has something to do with coercing comparators. I have tried three things, all have failed and I really don't know what to do next:
#1st failed solution attempt inserting
coerce_to_is_types = (pd.DataFrame,)
#2nd failed solution attempt inserting
def coerce_compared_value(self, op, value):
return self.impl.coerce_compared_value(op, value)
#3rd failed solution attempt
class comparator_factory(db.Text.comparator_factory):
def __eq__(self, other):
try:
value = (self == other).all().all()
except ValueError:
value = False
return value
On my fourth attempt I think I found the answer, I directly create my own compare function that I inserted in the Type class above. This avoids the operator 'x == y' being performed on my DataFrames:
def compare_values(self, x, y):
from pandas.util.testing import assert_frame_equal
try:
assert_frame_equal(x, y, check_names=True, check_like=True)
return True
except (AssertionError, ValueError, TypeError):
return False
Another problem of this nature later appeared in my code. The solution was to amend the above to attempt the natural compare first and if that failed then implement the above:
try:
value = x == y
except:
# some other overwriting comparision method such as above