pythondatabasepostgresqlauto-increment

NotNullViolation Error When Inserting Data into PostgreSQL Table with Non-Nullable ID Column


I'm working on inserting data from a Pandas DataFrame into a PostgreSQL table using Python. The table structure is as follows:

CREATE TABLE sales (
    id BIGINT NOT NULL,  -- Primary Key
    tahun INTEGER NOT NULL,
    bulan NUMERIC NOT NULL,
    v_kod VARCHAR(10) NOT NULL,
    o_kod VARCHAR(10) NOT NULL,
    amaun_rm NUMERIC NOT NULL,
    dt_updated TIMESTAMP WITHOUT TIME ZONE NOT NULL
);

Here is my Python code:

import pandas as pd
import psycopg2

# Load the CSV data
forecast_results = pd.read_csv("sales.csv")

# Filter the DataFrame to include only rows from September 2024 onward
filtered_forecast_results = forecast_results[(forecast_results['tahun'] > 2024) | 
                                             ((forecast_results['tahun'] == 2024) & (forecast_results['bulan'] >= 9))]

# Define the vot_kod value to be inserted
vot_kod_value = 'AAA'

# Connect to PostgreSQL
conn = psycopg2.connect(
    dbname="my_database",
    user="my_user",
    password="my_password",
    host="localhost",
    port="5432"
)
cur = conn.cursor()

for index, row in filtered_forecast_results.iterrows():
    # Convert the year and month to integers, but keep o_kod as a string
    tahun = int(row['tahun'])
    bulan = int(row['bulan'])
    o_kod = str(row['o_kod '])
    
    # Check if the row already exists
    cur.execute("""
        SELECT 1 FROM sales 
        WHERE tahun = %s AND bulan = %s AND v_kod = %s AND o_kod = %s
    """, (tahun, bulan, v_kod_value, o_kod))
    
    exists = cur.fetchone()
    
    if not exists:
        # If the row does not exist, insert it
        sql_query = """
            INSERT INTO sales (tahun, bulan, v_kod, o_kod, amaun_rm, dt_updated)
            VALUES (%s, %s, %s, %s, %s, NOW())
        """
        values = (
            tahun,
            bulan,
            v_kod_value,
            o_kod,
            round(row['predicted_amaun_rm'], 2)
        )
        cur.execute(sql_query, values)

# Commit the transaction
conn.commit()

# Close the cursor and connection
cur.close()
conn.close()

When I run this code, I encounter the following error:

NotNullViolation: null value in column "id" of relation "sales" violates not-null constraint
DETAIL:  Failing row contains (null, 2024, 9, AAA, 123, 2931.48, 2024-08-16 08:39:52.462847).

What I’ve Tried:

Questions:

  1. How can I properly insert the data while ensuring that the id column is populated correctly?
  2. Should the id column be set up as an auto-incrementing column in PostgreSQL? If so, how can I modify the table to achieve this?
  3. Is there a way to fetch and use the next available ID value within my Python code before insertion?

Solution

  • I haven’t manually specified an id because I thought it should be auto-incremented.

    A default value is provided if you use a serial or IDENTITY column (or otherwise add a column default). A serial or IDENTITY column draws from an underlying SEQUENCE. Then you simply omit the id column in an INSERT to get the next SEQUENCE value assigned.

    Alter your table (once), then your code should work as is.
    I suggest an IDENTITY column. Run once:

    DO
    $do$
    BEGIN
       EXECUTE format('ALTER TABLE sales
                       ALTER id ADD GENERATED ALWAYS AS IDENTITY (RESTART %s)' 
                    , (SELECT max(id) + 1 FROM sales));
    END
    $do$;
    

    fiddle

    I also get the sequence in sync with the current maximum id value immediately.

    Note, if you make it GENERATED ALWAYS, you now need special syntax to overrule that and insert values manually (which you typically shouldn't).

    See:


    The above answers questions 1. & 2.

    1. Is there a way to fetch and use the next available ID value within my Python code before insertion?

    Once you have converted the column, you can use sequence manipulation functions including nextval():

    SELECT nextval(pg_get_serial_sequence('sales', 'id'));
    

    See: