I am bulk inserting data into crateDB using
pandas.to_sql(tableId, 'crate://xxxxxxx:4200', if_exists='append', index=False, chunksize=20000)
The data contains a few faulty rows where one column of type 'numeric' contains a value of type 'object' / 'string'.
This results in a typecasting error while inserting.
When I set the chunksize parameter of to_sql() to 1, an exception is raised and I can catch the issue.
When chunksize is > 0, to_sql()
just continues and I cannot make sure that data was inserted properly.
What am I missing, and what would be a good approach to fix this problem?
The problem most likely is that pandas doesn't return anything on to_sql()
also see https://github.com/pandas-dev/pandas/issues/23998 and https://github.com/pandas-dev/pandas/issues/42645 . Since the bulk_insert in CrateDB doesn't completely fail, but only skips the faulty rows, no exception is raised, but only an array with a value per row is returned. That return however never is processed in pandas :/
One could change the SQL alchemy implementation of CrateDB, but this would in return break any implementation that actually uses the results array.