I'm working on inserting data from a Pandas DataFrame into a PostgreSQL table using Python. The table structure is as follows:
CREATE TABLE sales (
id BIGINT NOT NULL, -- Primary Key
tahun INTEGER NOT NULL,
bulan NUMERIC NOT NULL,
v_kod VARCHAR(10) NOT NULL,
o_kod VARCHAR(10) NOT NULL,
amaun_rm NUMERIC NOT NULL,
dt_updated TIMESTAMP WITHOUT TIME ZONE NOT NULL
);
Here is my Python code:
import pandas as pd
import psycopg2
# Load the CSV data
forecast_results = pd.read_csv("sales.csv")
# Filter the DataFrame to include only rows from September 2024 onward
filtered_forecast_results = forecast_results[(forecast_results['tahun'] > 2024) |
((forecast_results['tahun'] == 2024) & (forecast_results['bulan'] >= 9))]
# Define the vot_kod value to be inserted
vot_kod_value = 'AAA'
# Connect to PostgreSQL
conn = psycopg2.connect(
dbname="my_database",
user="my_user",
password="my_password",
host="localhost",
port="5432"
)
cur = conn.cursor()
for index, row in filtered_forecast_results.iterrows():
# Convert the year and month to integers, but keep o_kod as a string
tahun = int(row['tahun'])
bulan = int(row['bulan'])
o_kod = str(row['o_kod '])
# Check if the row already exists
cur.execute("""
SELECT 1 FROM sales
WHERE tahun = %s AND bulan = %s AND v_kod = %s AND o_kod = %s
""", (tahun, bulan, v_kod_value, o_kod))
exists = cur.fetchone()
if not exists:
# If the row does not exist, insert it
sql_query = """
INSERT INTO sales (tahun, bulan, v_kod, o_kod, amaun_rm, dt_updated)
VALUES (%s, %s, %s, %s, %s, NOW())
"""
values = (
tahun,
bulan,
v_kod_value,
o_kod,
round(row['predicted_amaun_rm'], 2)
)
cur.execute(sql_query, values)
# Commit the transaction
conn.commit()
# Close the cursor and connection
cur.close()
conn.close()
When I run this code, I encounter the following error:
NotNullViolation: null value in column "id" of relation "sales" violates not-null constraint
DETAIL: Failing row contains (null, 2024, 9, AAA, 123, 2931.48, 2024-08-16 08:39:52.462847).
What I’ve Tried:
Questions:
I haven’t manually specified an id because I thought it should be auto-incremented.
A default value is provided if you use a serial
or IDENTITY
column (or otherwise add a column default). A serial
or IDENTITY
column draws from an underlying SEQUENCE
. Then you simply omit the id
column in an INSERT
to get the next SEQUENCE
value assigned.
Alter your table (once), then your code should work as is.
I suggest an IDENTITY
column. Run once:
DO
$do$
BEGIN
EXECUTE format('ALTER TABLE sales
ALTER id ADD GENERATED ALWAYS AS IDENTITY (RESTART %s)'
, (SELECT max(id) + 1 FROM sales));
END
$do$;
I also get the sequence in sync with the current maximum id
value immediately.
Note, if you make it GENERATED ALWAYS
, you now need special syntax to overrule that and insert values manually (which you typically shouldn't).
See:
The above answers questions 1. & 2.
- Is there a way to fetch and use the next available ID value within my Python code before insertion?
Once you have converted the column, you can use sequence manipulation functions including nextval()
:
SELECT nextval(pg_get_serial_sequence('sales', 'id'));
See: