pythonpostgresqlpsycopg2

Batch insert data using psycopg2 vs psycopg3


Currently i am inserting to postgres database using psycopg2. Data is large and also the write frequency is high, so my database has WAL disabled and few other optimizations for faster writes.

When i use psycopg2 with execute_values, i am able to write batch of 1000 rows in 0.1-0.15 seconds.

from psycopg2.extras import execute_values


    self.engine = create_engine(f'postgresql+psycopg2://postgres:password@localhost/postgres', pool_size=DB_POOL_SIZE,max_overflow=20)


    def insert_data_todb(self, table_name, batch_data):
        try:
            t1 = time.perf_counter()
            insert_sql = f"""INSERT INTO {table_name} ({self._market_snapshot_columns_str}) VALUES %s;"""
            with self.engine.connect() as conn,  conn.connection.cursor() as cur:
                    execute_values(cur, insert_sql, batch_data)
            t2 = time.perf_counter()
            logger.info(f"Inserted {len(batch_data)} records in {t2 - t1} seconds")
        except Exception as ex:
            logger.error(f"Error inserting batch data into {table_name}:")
            logger.exception(ex)

I uninstalled psycopg2 and installed psycopg 3.2. and used psycopg3's executemany function like this:

import psycopg

    self.engine = create_engine(f'postgresql+psycopg://postgres:password@localhost/postgres', pool_size=DB_POOL_SIZE,max_overflow=20)

def insert_data_todb(self, table_name, batch_data):
    try:
        t1 = time.perf_counter()
        placeholders = ', '.join(['%s'] * len(batch_data[0]))
        insert_sql = f"""INSERT INTO {table_name} ({self._market_snapshot_columns_str}) VALUES ({placeholders});"""  # stored variable
        
        with self.engine.connect() as conn:
            with conn.cursor() as cur:
                cur.executemany(insert_sql, batch_data)  # Pass the batch data directly
        t2 = time.perf_counter()
        logger.info(f"Inserted {len(batch_data)} records in {t2 - t1} seconds")
    except Exception as ex:
        logger.error(f"Error inserting batch data into {table_name}:")
        logger.exception(ex)

My psycopg3 code is way slower! it takes 8-20 seconds to insert the same batches.


Solution

  • These examples aren't equivalent and it's not about the psycopg version:

    These both lose to a very simple batch insert that uses copy instead of insert. Quoting the result summary from an example benchmark:

    Function Time (seconds) Memory (MB)
    insert_one_by_one() 128.8 0.08203125
    insert_executemany() 124.7 2.765625
    insert_executemany_iterator() 129.3 0.0
    insert_execute_batch() 3.917 2.50390625
    insert_execute_batch_iterator(page_size=1) 130.2 0.0
    insert_execute_batch_iterator(page_size=100) 4.333 0.0
    insert_execute_batch_iterator(page_size=1000) 2.537 0.2265625
    insert_execute_batch_iterator(page_size=10000) 2.585 25.4453125
    insert_execute_values() 3.666 4.50390625
    insert_execute_values_iterator(page_size=1) 127.4 0.0
    insert_execute_values_iterator(page_size=100) 3.677 0.0
    insert_execute_values_iterator(page_size=1000) 1.468 0.0
    insert_execute_values_iterator(page_size=10000) 1.503 2.25
    copy_stringio() 0.6274 99.109375
    copy_string_iterator(size=1024) 0.4536 0.0
    copy_string_iterator(size=8192) 0.4596 0.0
    copy_string_iterator(size=16384) 0.4649 0.0
    copy_string_iterator(size=65536) 0.6171 0.0