My query deletes the whole table instead of duplicate rows. Video as proof: https://streamable.com/3s843
create table customer_info (
id INT,
first_name VARCHAR(50),
last_name VARCHAR(50),
phone_number VARCHAR(50)
);
insert into customer_info (id, first_name, last_name, phone_number) values
(1, 'Kevin', 'Binley', '600-449-1059'),
(1, 'Kevin', 'Binley', '600-449-1059'),
(2, 'Skippy', 'Lam', '779-278-0889');
My query:
with t1 as (
select *, row_number() over(partition by id order by id) as rn
from customer_info)
delete
from customer_info
where id in (select id from t1 where rn > 1);
Your query would delete all rows from each set of dupes (all share the same id
by which you select - that's what wildplasser hinted at with subtle comments). Only initially unique rows would survive. So if it "deletes the whole table", that means there were no unique rows at all.
In your query, dupes are defined by (id)
alone, not by the whole row as your title suggests.
Either way, there is a remarkably simple solution:
DELETE FROM customer_info c
WHERE EXISTS (
SELECT FROM customer_info c1
WHERE ctid < c.ctid
AND c1 = c -- comparing whole rows
);
Since you deal with completely identical rows, the remaining way to tell them apart is the internal tuple ID ctid
.
Careful with table inheritance or partitioning. Then there can be multiple physical tables involved and ctid
is not unique within the scope. (But there shouldn't be completely identical rows in partition or inheritance children.) See:
My query deletes all rows, where an identical row with a smaller ctid
exists. Hence, only the "first" row from each set of dupes survives.
Notably, NULL
values compare equal in this case - which is most probably as desired. The manual:
The SQL specification requires row-wise comparison to return NULL if the result depends on comparing two NULL values or a NULL and a non-NULL. PostgreSQL does this only when comparing the results of two row constructors (as in Section 9.23.5) or comparing a row constructor to the output of a subquery (as in Section 9.22). In other contexts where two composite-type values are compared, two NULL field values are considered equal, [...]
If dupes are defined by id
alone (as your query suggests), then this would work:
DELETE FROM customer_info c
WHERE EXISTS (
SELECT FROM customer_info c1
WHERE ctid < c.ctid
AND id = c.id
);
But then there might be a better way to decide which rows to keep than ctid
as a measure of last resort!
Obviously, you would then add a PRIMARY KEY
to avoid the initial dilemma from reappearing. For the second interpretation, id
is the candidate.
Related:
About ctid
: