pythonpandaspostgresqlsqlalchemyupsert

How to upsert pandas DataFrame to PostgreSQL table?


I've scraped some data from web sources and stored it all in a pandas DataFrame. Now, in order harness the powerful db tools afforded by SQLAlchemy, I want to convert said DataFrame into a Table() object and eventually upsert all data into a PostgreSQL table. If this is practical, what is a workable method of going about accomplishing this task?


Solution

  • Update: You can save yourself some typing by using this method.


    If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT statement:

    import sqlalchemy as sa
    
    # …
    
    with engine.begin() as conn:
        # step 0.0 - create test environment
        conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
        conn.exec_driver_sql(
            "CREATE TABLE main_table (id int primary key, txt varchar(50))"
        )
        conn.exec_driver_sql(
            "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
        )
        # step 0.1 - create DataFrame to UPSERT
        df = pd.DataFrame(
            [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
        )
        
        # step 1 - create temporary table and upload DataFrame
        conn.exec_driver_sql(
            "CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false"
        )
        df.to_sql("temp_table", conn, index=False, if_exists="append")
    
        # step 2 - merge temp_table into main_table
        conn.exec_driver_sql(
            """\
            INSERT INTO main_table (id, txt) 
            SELECT id, txt FROM temp_table
            ON CONFLICT (id) DO
                UPDATE SET txt = EXCLUDED.txt
            """
        )
    
        # step 3 - confirm results
        result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all()
        print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]