pythonpandasdataframesqlalchemypandas-to-sql

SQLALCHEMY Cannot insert explicit value for identity column in table 'tblsummary' when IDENTITY_INSERT is set to OFF


I'm totally new to Pandas and SQL Alchemy.

I'm trying to write code to import Microsoft Access database to SQL Server.

Using the pyodbc and Pandas dataframe approach it's working fine but it's terribly slow. (sample with only two fields...there's much more fields)

for index, row in df.iterrows():
   cursor.execute("INSERT INTO  tblsummary (myDate, REG_NO) values(?, ?)", mydate, row.REG_NO)
   print(index)
cnxn.commit()

I've read that in can speed up thing with SQL Alchemy and the "fast_executemany=True" on the create_engine.

Here's what i'm doing:

from time import perf_counter
import numpy as np
import pyodbc
import pandas
import sqlalchemy as sa
from sqlalchemy.engine import URL

connection_string = (
    r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
    r"DBQ=C:\Users\myuser\Sept23.mdb;"
    r"ExtendedAnsiSQL=1;"
)
connection_url = sa.engine.URL.create(
    "access+pyodbc",
    query={"odbc_connect": connection_string}
)
engine = sa.create_engine(connection_url)

df = pandas.read_sql("SELECT * FROM tblSummary", engine, parse_dates=["VOLCOUNTSTARTDATE"])

df = df.replace({np.NaN: None})

connection_string = (
    r"Driver={ODBC Driver 17 for SQL Server};"
    r"Server=XXX\YYY;"
    r"Database=ZZZ;"
    r"Trusted_Connection=yes;"
)
connection_url = URL.create(
    "mssql+pyodbc", 
    query={"odbc_connect": connection_string}
)

engine2 = sa.create_engine(connection_url, fast_executemany=True)

sqlcon = engine2.connect()

sqlcon.execute(sa.sql.text("SET IDENTITY_INSERT tblsummary ON;"))

#need to rename those two field as Django not accepting it in Models.py
df.rename(columns={'CLASS': 'myclass', 'TYPE': 'mytype'}, inplace=True)

#a constant date i need to insert
df.insert(0, 'myDate', pandas.Timestamp('2016-11-06'))

t0 = perf_counter()
df.to_sql("tblsummary", engine2, index = True, index_label="id", if_exists="append")
print(f"{perf_counter() - t0} seconds")

sqlcon.execute(sa.sql.text(f"SET IDENTITY_INSERT  tblsummary OFF;"))
sqlcon.close()

I always got that IntegrityError message telling me to "SET IDENTITY to ON"

I'm trying to do it with that line but without success:

sqlcon.execute(sa.sql.text("SET IDENTITY_INSERT tblsummary ON;"))

IntegrityError                            Traceback (most recent call last) File C:\ProgramData\anaconda3\Lib\site-packages\sqlalchemy\engine\base.py:1934, in Connection._exec_single_context(self, dialect, context, statement, parameters)    1933     if not evt_handled:
-> 1934         self.dialect.do_executemany(    1935             cursor,    1936             str_statement,    1937             effective_parameters,    1938             context,    1939         )   1940 elif not effective_parameters and context.no_parameters:

File C:\ProgramData\anaconda3\Lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py:716, in MSDialect_pyodbc.do_executemany(self, cursor, statement, parameters, context)
    715     cursor.fast_executemany = True
--> 716 super().do_executemany(cursor, statement, parameters, context=context)

File C:\ProgramData\anaconda3\Lib\site-packages\sqlalchemy\engine\default.py:918, in DefaultDialect.do_executemany(self, cursor, statement, parameters, context)
    917 def do_executemany(self, cursor, statement, parameters, context=None):
--> 918     cursor.executemany(statement, parameters)

IntegrityError: ('23000', "[23000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot insert explicit value for identity column in table 'tblsummary' when IDENTITY_INSERT is set to OFF. (544) (SQLParamData);

And the echo on the SQL Server engine.

2023-11-27 19:03:22,777 INFO sqlalchemy.engine.Engine SELECT CAST(SERVERPROPERTY('ProductVersion') AS VARCHAR)

2023-11-27 19:03:22,779 INFO sqlalchemy.engine.Engine [raw sql] ()

2023-11-27 19:03:22,807 INFO sqlalchemy.engine.Engine SELECT schema_name()

2023-11-27 19:03:22,809 INFO sqlalchemy.engine.Engine [generated in
0.00132s] ()

2023-11-27 19:03:22,872 INFO sqlalchemy.engine.Engine SELECT CAST('test max support' AS NVARCHAR(max))

2023-11-27 19:03:22,874 INFO sqlalchemy.engine.Engine [generated in
0.00134s] ()

2023-11-27 19:03:22,895 INFO sqlalchemy.engine.Engine SELECT 1 FROM fn_listextendedproperty(default, default, default, default, default, default, default)

2023-11-27 19:03:22,897 INFO sqlalchemy.engine.Engine [generated in
0.00181s] ()

2023-11-27 19:03:22,937 INFO sqlalchemy.engine.Engine BEGIN (implicit)

2023-11-27 19:03:22,937 INFO sqlalchemy.engine.Engine SET IDENTITY_INSERT tblsummary ON;

2023-11-27 19:03:22,938 INFO sqlalchemy.engine.Engine [generated in
0.00197s] ()

2023-11-27 19:03:23,126 INFO sqlalchemy.engine.Engine BEGIN (implicit)

2023-11-27 19:03:23,175 INFO sqlalchemy.engine.Engine SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]  FROM [INFORMATION_SCHEMA].[TABLES]  WHERE ([INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) OR [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max))) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max))

2023-11-27 19:03:23,177 INFO sqlalchemy.engine.Engine [generated in
0.00162s] ('BASE TABLE', 'VIEW', 'tblsummary', 'dbo')

2023-11-27 19:03:23,905 INFO sqlalchemy.engine.Engine INSERT INTO tblsummary (id, [myDate], [REG_NO], [OFF_NO], [OFFICE], [ZONE], [INSTALN], [RES_ROUTE], [GENUS], [TRIP], mytype, myclass, [ROUTEMODE], [MMC], [VEHICLE], [RWOOT], [XT_VHCLE], [XT_OTHER], [XT_WAIT], [XT_RIDE], [CUC], [CUCT], [HCALLS], [HALLOW], [SPECT], [ASSIST], [GROWTH], [VVAL_100P], [VVAL_RCOV], [TOTNVAR], [NVVAL], [PTACT], [OBTP_BVAL], [SUBD_BVAL], [PREP_BVAL], [SORT_BVAL], [TOT_BVAL], [TOT_ASSMT], [TOT_WKLOAD], [G_GROWTH], [G_INSIDE], [G_DELIVERY], [G_HALLOW], [CONTROL], [LC_SUPPORT], [RES_DATE], [VOLCOUNTSTARTDATE], [EVALVERSION], [RESTRUCTUREENDDATE]) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

2023-11-27 19:03:23,905 INFO sqlalchemy.engine.Engine [generated in
0.50666s] [(0, datetime.datetime(2016, 11, 6, 0, 0), 1, 'A001', 'AMHERST LCD MAIN', '', 'AMHERST LCD MAIN', '   LC0011', 'C', '1',
6.0, 1.0, 2.0, None, 'TWT', 58.35, 17.0, 0.0, 0.0, 11.72, 0.0, 0.0, 608.0, 1.71, 14.53, 0.0, 0.0, 292.65, 131.2, 128.24, 169.69, 608.0, 5.98, 4.9, 23.99, 27.02, 362.79, 466.14, 466.14, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '1008.1003', datetime.datetime(2022, 4, 22, 0, 0)), (1, datetime.datetime(2016, 11, 6, 0, 0), 1, 'A001', 'AMHERST LCD MAIN', '', 'AMHERST LCD MAIN', '   LC0012', 'C', '1', 12.0, 1.0, 1.0, None, None, 58.35, 0.0, 0.0, 0.0, 12.17, 0.0, 0.0, 639.0, 2.41, 0.0, 0.0,
0.0, 306.45, 116.06, 207.07, 240.23, 639.0, 5.54, 6.81, 17.52, 18.55, 404.73, 477.67, 477.67, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '1008.1003', datetime.datetime(2022, 4, 22, 0, 0)), (2, datetime.datetime(2016, 11, 6, 0, 0), 1, 'A001', 'AMHERST LCD MAIN', '', 'AMHERST LCD MAIN', '   LC0013', 'C', '1', 1.0, 1.0, 1.0, None, None, 58.35, 0.0, 0.0, 0.0, 11.92, 0.0, 0.0, 786.0, 2.2, 0.88, 0.0,
0.0, 271.6, 97.58, 222.1, 249.43, 786.0, 5.68, 8.91, 18.42, 21.0, 401.02, 474.4, 474.4, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '1008.1003', datetime.datetime(2022, 4, 22, 0, 0)), (3, datetime.datetime(2016, 11, 6, 0, 0), 1, 'A001', 'AMHERST LCD MAIN', '', 'AMHERST LCD MAIN', '   LC0014', 'C', '1', 1.0, 1.0, 1.0, None, None, 58.35, 0.0, 0.0, 0.0, 11.3, 0.0, 0.0, 767.0, 2.26, 0.0, 0.0,
0.0, 267.46, 93.79, 237.31, 258.7, 767.0, 5.61, 7.41, 17.8, 19.83, 403.15, 475.16, 475.16, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '1008.1003', datetime.datetime(2022, 4, 22, 0, 0)), (4, datetime.datetime(2016, 11, 6, 0, 0), 1, 'A001', 'AMHERST LCD MAIN', '', 'AMHERST LCD MAIN', '   LC0015', 'C', '1', 1.0, 1.0, 1.0, None, None, 58.35, 0.0, 0.0, 0.0, 5.93, 0.0, 0.0, 813.0, 1.39, 0.88, 0.0,
0.0, 247.64, 82.66, 223.86, 269.57, 813.0, 5.71, 9.39, 18.73, 21.18, 407.24, 473.89, 473.89, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '1008.1003', datetime.datetime(2022, 4, 22, 0, 0)), (5, datetime.datetime(2016, 11, 6, 0, 0), 1, 'A001', 'AMHERST LCD MAIN', '', 'AMHERST LCD MAIN', '   LC0016', 'C', '1', 1.0, 1.0, 1.0, None, None, 58.35, 0.0, 0.0, 0.0, 7.09, 0.0, 0.0, 723.0, 2.09, 0.97, 0.0,
0.0, 303.54, 115.84, 208.27, 238.81, 723.0, 5.72, 8.4, 18.82, 21.47, 409.05, 477.57, 477.57, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '1008.1003', datetime.datetime(2022, 4, 22, 0, 0)), (6, datetime.datetime(2016, 11, 6, 0, 0), 1, 'A001', 'AMHERST LCD MAIN', '', 'AMHERST LCD MAIN', '   LC0017', 'C', '1', 12.0, 2.0, 1.0, None, None, 13.35, 0.0, 0.0, 0.0, 3.42, 0.0, 0.0, 345.0, 2.24, 0.0, 0.0,
0.0, 164.48, 69.26, 115.8, 130.57, 345.0, 5.24, 4.49, 13.23, 12.0, 234.79, 253.83, 253.83, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '1008.1003', datetime.datetime(2022, 4, 22, 0, 0)), (7, datetime.datetime(2016, 11, 6, 0, 0), 1, 'A001', 'AMHERST LCD MAIN', '', 'AMHERST LCD MAIN', '   LC0018', 'C', '1', 12.0, 2.0, 1.0, None, None, 13.35, 0.0, 0.0, 0.0, 5.07, 0.0, 0.0, 366.0, 2.78, 1.58, 0.0,
0.0, 182.13, 80.95, 108.57, 123.68, 366.0, 5.33, 4.47, 13.02, 13.71, 241.15, 263.96, 263.96, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '1008.1003', datetime.datetime(2022, 4, 22, 0, 0))  ... displaying 10 of 13310 total bound parameter sets ...  (13308, datetime.datetime(2016, 11, 6, 0, 0), 5, 'E084', 'BURNABY LCD NORTH FRASER 2', '', 'BURNABY LCD NORTH FRASER 2', '   LC0244', 'C', '1', None, 1.0, 2.0, None, 'TWT', 58.35, 34.46, 0.0, 0.0, 26.49, 0.0, 0.0,
858.0, 14.17, 47.62, 0.0, 0.0, 251.82, 129.76, 104.33, 138.2, 858.0, 0.0, 9.27, 16.33, 0.0, 294.63, 474.65, 474.65, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '3503.3501', datetime.datetime(2022, 5, 6, 0, 0)), (13309, datetime.datetime(2016, 11, 6, 0, 0), 5, 'E084', 'BURNABY LCD NORTH FRASER 2', '', 'BURNABY LCD NORTH FRASER 2', '   LC0246', 'C', '1', None, 2.0, 2.0, None, 'TWT', 13.35, 34.46, 0.0, 0.0, 25.7, 0.0, 0.0,
285.0, 5.24, 23.69, 0.0, 0.0, 111.31, 70.52, 62.88, 94.1, 285.0, 0.0, 2.85, 4.24, 0.0, 171.7, 274.15, 274.15, 0.0, 0.0, 0.0, 0.0, '', 0.0, datetime.datetime(2022, 8, 15, 0, 0), Timestamp('2022-03-07 00:00:00'), '3503.3501', datetime.datetime(2022, 5, 6, 0, 0))]

2023-11-27 19:03:43,360 INFO sqlalchemy.engine.Engine COMMIT

Solution

  • I think the problem might be that you are using engine2 instead of the connection you made so you are probably working on 2 diff connections. Ie. df.to_sql("tblsummary", engine2, should be df.to_sql("tblsummary", sqlcon,. But can you try an explicit transaction like this?

    engine2 = sa.create_engine(connection_url, fast_executemany=True)
    
    with engine2.begin() as sqlcon:
    
        sqlcon.execute(sa.sql.text("SET IDENTITY_INSERT tblsummary ON;"))
    
        #need to rename those two field as Django not accepting it in Models.py
        df.rename(columns={'CLASS': 'myclass', 'TYPE': 'mytype'}, inplace=True)
    
        t0 = perf_counter()
        df.to_sql("tblsummary", sqlcon, index = True, index_label="id", if_exists="append")
        print(f"{perf_counter() - t0} seconds")
    
        sqlcon.execute(sa.sql.text(f"SET IDENTITY_INSERT  tblsummary OFF;"))
    
        # transaction is committed and connection returned to engine
    

    Here is the documentation explaining this usage: connect-and-begin-once-from-the-engine

    If you really wanted to fully understand what went wrong you could also turn on the engine pool echo flag, sqlalchemy.pool.Pool.params.echo. I think two connections that each have their own transaction.