i am having problems to load data into an access-database. For testing purpose i build a little convert functions which takes all data-sets from a hdf-file and writes it into the accdb. Without the @event.listens_for(engine, "before_cursor_execute")
functionality it works, but veeery slow. With it, it creates an odd behavior. It creates only one empty table (from the first df) in the db and finishes execution. The for-loop will never be finished and no error raises.
Maybe it’s because the sqlalchemy-access
package doesn’t support fast_executemany but couldn’t find any related information about it. Does any of you have some input for me how i can solve it or be able to write data in a faster way into the db?
big thanks!
import urllib
from pathlib import Path
from sqlalchemy import create_engine, event
# PATHS
HOME = Path(__file__).parent
DATA_DIR = HOME / 'output'
FILE_ACCESS = DATA_DIR / 'db.accdb'
FILE_HDF5 = DATA_DIR / 'Data.hdf'
# FUNCTIONS
def convert_from_hdf_to_accb():
# https://github.com/gordthompson/sqlalchemy-access/wiki/Getting-Connected
driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
conn_str = 'DRIVER={};DBQ={};'.format(driver, FILE_ACCESS)
conn_url = "access+pyodbc:///?odbc_connect={}".format(urllib.parse.quote_plus(conn_str))
# https://medium.com/analytics-vidhya/speed-up-bulk-inserts-to-sql-db-using-pandas-and-python-61707ae41990
# https://github.com/pandas-dev/pandas/issues/15276
# https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc
engine = create_engine(conn_url)
@event.listens_for(engine, "before_cursor_execute")
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
with pd.HDFStore(path=FILE_HDF5, mode="r") as store:
for key in store.keys():
df = store.get(key)
df.to_sql(name=key, con=engine, index=False, if_exists='replace')
print(' IT NEVER REACHES AND DOESNT RAISE AN ERROR :( ')
# EXECUTE
if __name__ == "__main__":
convert_from_hdf_to_accb()
Maybe it’s because the sqlalchemy-access package doesn’t support fast_executemany
That is true. pyodbc's fast_executemany
feature requires that the driver support an internal ODBC mechanism called "parameter arrays", and the Microsoft Access ODBC driver does not support them.
See also
https://github.com/mkleehammer/pyodbc/wiki/Driver-support-for-fast_executemany