pythondjangopostgresql

Better way to insert a very large data into PostgreSQL tables


What is the better way to insert a very large data into PostgreSQL Table?
OS: Ubuntu 22.04 LTS
DB: PostgreSQL 14
Framework: Python 3.11 Django

For now I am using insert into statement of 100,000 rows at a time. It is taking 2 Minutes for the whole process of inserting average of 1,000,000 rows, which is within my acceptable range. But I want to know if there is any better way to do this.

It was working fine but somehow it is taking more time and sometimes giving errors like

OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
from django.db import connection
cursor = connection.cursor()

batch_size = 100000

offset = 0
while True:
    transaction_list_query = SELECT * FROM {source_table} LIMIT {batch_size} OFFSET {offset};  
    cursor.execute(transaction_list_query)
    transaction_list = dictfetchall(cursor)
    if not transaction_list:
        break
    data_to_insert = []
    for transaction in transaction_list:
        # Some Process Intensive Calculations

    insert_query = INSERT INTO {per_transaction_table} ({company_ref_id_id_column}, {rrn_column},{transaction_type_ref_id_id_column}, {transactionamount_column}) VALUES  {",".join(data_to_insert)} ON CONFLICT ({rrn_column}) DO UPDATE SET {company_ref_id_id_column} = EXCLUDED.{company_ref_id_id_column};
    cursor.execute(insert_query)
    offset += batch_size

Solution

  • A faster way could be to use a prepared statement in a session, and then repeatedly exec ing it with a new batch of rows.

    An even faster way would be to use COPY (optionally WITH (FREEZE) on unlogged tables: https://www.postgresql.org/docs/current/sql-copy.html and add indexes and constraints later.