pythonsqlalchemytemp-tablessap-aseexecutemany

Speeding up insert into temp table in Sybase using sqlalchemy


I am getting extremely poor performance when inserting into sybase temp table - in the order of 10's of seconds for only 1000 rows. It takes 46 seconds for 1000 rows and 10000 rows takes multiple minutes. I will have about 100k rows. Is there any way to speed this insert up? Could it be that temp table in Sybase is configured incorrectly - if so, what do I need to ask my DBA about the configuration?

Code :
    engine = get_connection(db_constants.DB_RISK)
    data = []
    for i in range(1, 1000, 1):
        values = ['AL' + str(i)]
        data.append(values)
    with engine.connect() as conn:
        conn.execute("CREATE TABLE " + tmp_table_name + "(alias_id varchar(255));")
        start = time.time()
        for i in range(0, len(data), 1000):
            loop_start = time.time()
            with conn.begin():
                stmt = "INSERT INTO #TMP (alias_id) VALUES (?)"
                conn.execute(stmt, data[i:i + 1000])
            loop_end = time.time()
            print("batch " + str(i) + " time elapsed : " + str(loop_end - loop_start))
        end = time.time()
        print("total time elapsed : " + str(end - start))

Solution

  • Use fast_executemany=True.

    pyodbc defaults to fast_executemany=False because not all ODBC drivers support it. However, the SAP ASE ODBC driver does support it, so while this code takes about 11 seconds to execute with a local SAP ASE instance

    import os
    import time
    
    import sqlalchemy as sa
    
    connection_url = f"sybase+pyodbc://sa:{os.environ['sa_PWD']}@mydb_dsn_sybase"
    
    engine = sa.create_engine(connection_url)
    
    table_name = "#tmp"
    
    rows_to_insert = 10_000
    data = [{"alias_id": f"alias_{x}"} for x in range(rows_to_insert)]
    
    with engine.begin() as conn:
        conn.exec_driver_sql(f"CREATE TABLE {table_name} (alias_id varchar(255))")
        t0 = time.perf_counter()
        conn.execute(
            sa.text(f"INSERT INTO {table_name} (alias_id) VALUES (:alias_id)"),
            data,
        )
        print(
            f"{rows_to_insert:,} rows inserted in {(time.perf_counter() - t0):0.1f} seconds"
        )
    

    simply changing the create_engine() call to

    engine = sa.create_engine(connection_url, fast_executemany=True)
    

    reduces the execution time to less than a second.