pythonpandasms-accesspyodbcsqlalchemy-access

Pyodbc Connection to Access, creating table with Pandas to_sql(method='multi') throwing errror


I've installed sql-alchemy Access so that I'm able to use pandas and pyodbc to query my Access DB's.

The issue is, it's incredibly slow because it does single row inserts. Another post suggested I use method='multi' and while it seems to work for whoever asked that question, it throws a CompileError for me.

CompileError: The 'access' dialect with current database version settings does not support in-place multirow inserts.

AttributeError: 'CompileError' object has no attribute 'orig'

import pandas as pd
import pyodbc
import urllib
from sqlalchemy import create_engine
connection_string = (
    r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
    rf"DBQ={accessDB};"
    r"ExtendedAnsiSQL=1;"
)

connection_uri = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
engine = create_engine(connection_uri)
conn = engine.connect()

# Read in tableau SuperStore data
dfSS = pd.read_excel(ssData)

dfSS.to_sql('SuperStore', conn, index=False, method='multi')

Solution

  • Access SQL doesn't support multi-row inserts, so a to_sql will never be able to support them as well. That other post is probably using SQLite.

    Instead, you can write the data frame to CSV, and insert the CSV by using a query.

    Or, of course, not read the Excel in Python at all, but just insert the Excel file by query. This will always be much faster as Access can directly read the data instead of Python reading it and then transmitting it.

    E.g.

    INSERT INTO SuperStore 
    SELECT * FROM [Sheet1$] IN "C:\Path\To\File.xlsx"'Excel 12.0 Macro;HDR=Yes'
    

    You should be able to execute this using pyodbc without needing to involve sqlalchemy. Do note the double and single quote combination, they can be a bit painful when embedding them in other programming languages.