Batch insert data using psycopg2 vs psycopg3

Currently i am inserting to postgres database using psycopg2. Data is large and also the write frequency is high, so my database has WAL disabled and few other optimizations for faster writes.

When i use psycopg2 with execute_values, i am able to write batch of 1000 rows in 0.1-0.15 seconds.

from psycopg2.extras import execute_values


    self.engine = create_engine(f'postgresql+psycopg2://postgres:password@localhost/postgres', pool_size=DB_POOL_SIZE,max_overflow=20)


    def insert_data_todb(self, table_name, batch_data):
        try:
            t1 = time.perf_counter()
            insert_sql = f"""INSERT INTO {table_name} ({self._market_snapshot_columns_str}) VALUES %s;"""
            with self.engine.connect() as conn,  conn.connection.cursor() as cur:
                    execute_values(cur, insert_sql, batch_data)
            t2 = time.perf_counter()
            logger.info(f"Inserted {len(batch_data)} records in {t2 - t1} seconds")
        except Exception as ex:
            logger.error(f"Error inserting batch data into {table_name}:")
            logger.exception(ex)

I uninstalled psycopg2 and installed psycopg 3.2. and used psycopg3's executemany function like this:

import psycopg

    self.engine = create_engine(f'postgresql+psycopg://postgres:password@localhost/postgres', pool_size=DB_POOL_SIZE,max_overflow=20)

def insert_data_todb(self, table_name, batch_data):
    try:
        t1 = time.perf_counter()
        placeholders = ', '.join(['%s'] * len(batch_data[0]))
        insert_sql = f"""INSERT INTO {table_name} ({self._market_snapshot_columns_str}) VALUES ({placeholders});"""  # stored variable
        
        with self.engine.connect() as conn:
            with conn.cursor() as cur:
                cur.executemany(insert_sql, batch_data)  # Pass the batch data directly
        t2 = time.perf_counter()
        logger.info(f"Inserted {len(batch_data)} records in {t2 - t1} seconds")
    except Exception as ex:
        logger.error(f"Error inserting batch data into {table_name}:")
        logger.exception(ex)

My psycopg3 code is way slower! it takes 8-20 seconds to insert the same batches.

Solution

These examples aren't equivalent and it's not about the psycopg version:

Your cur.executemany() in the second example runs one insert per row.
The execute_values() in the first example can construct inserts with longer values lists, which is typically more effective.

These both lose to a very simple batch insert that uses copy instead of insert. Quoting the result summary from an example benchmark:

Function	Time (seconds)	Memory (MB)
insert_one_by_one()	128.8	0.08203125
insert_executemany()	124.7	2.765625
insert_executemany_iterator()	129.3	0.0
insert_execute_batch()	3.917	2.50390625
insert_execute_batch_iterator(page_size=1)	130.2	0.0
insert_execute_batch_iterator(page_size=100)	4.333	0.0
insert_execute_batch_iterator(page_size=1000)	2.537	0.2265625
insert_execute_batch_iterator(page_size=10000)	2.585	25.4453125
insert_execute_values()	3.666	4.50390625
insert_execute_values_iterator(page_size=1)	127.4	0.0
insert_execute_values_iterator(page_size=100)	3.677	0.0
insert_execute_values_iterator(page_size=1000)	1.468	0.0
insert_execute_values_iterator(page_size=10000)	1.503	2.25
copy_stringio()	0.6274	99.109375
copy_string_iterator(size=1024)	0.4536	0.0
copy_string_iterator(size=8192)	0.4596	0.0
copy_string_iterator(size=16384)	0.4649	0.0
copy_string_iterator(size=65536)	0.6171	0.0