sqlalchemyhexvarbinarymemoryview

Reading VARBINARY via SQLAlchemy


I am working on a base-datatable with a VARBINARY variable. Now I want to read the table via SQLAlchemy into a pandas dataframe. Going the usual way

df = pandas.read_sql_query("select key from xxx", engine)

I get an uninterpretable memoryview as data type. I can convert this via lambda function

df.key.apply(lambda x: x.tobytes().hex())

into the desired readable format. But I would like to know if the casting can also be placed directly into the pandas.read_sql_query()-statement:

Many greetings and best thanks


Solution

  • I am not sure if this will help you but, thanks to @JGFMK response, I was able to come up with something similar on my program:

    1. Defined the property using Mapped to convert the value that has a DATA_TYPE from the SQL Server.
    2. Used a decode method on the property, according to the collation my SQL Server is using, going to Properties > General on my Database.

    This was the result:

        # `db_sql.VARBINARY` informs the `DATA_TYPE` from the table field.
        # Mapped does the conversion from table type to python type.
        order_description_binary: Mapped[bytes] = db_sql.Column('TJ_OBSERVA', db_sql.VARBINARY, nullable=True, default=None)
    
        @property
        def ordem_description(self):
            if self.order_description_binary is not None:
                try:
                    description_text = self.order_description_binary .decode('latin1')
                    description_text = description_text.replace('\x00', '')
                    return description_text
                except UnicodeDecodeError as e:
                    print(f"Error decoding order_description_binary: {e}")
                    return None
            else:
                return None
    

    So, in your case you can try using decode alongside replace on the specific data on your dataframe that has VARBINARY type.

    Example:

    df = pandas.read_sql_query("select key from xxx", engine)
    df.specific_data.decode('latin1').replace('\x00', '')
    

    I believe something like that would work.