I need help building an SQL query to identify duplicate rows based on multiple columns.
For example, if I have this table:
CREATE TABLE Users
(
ID int,
NAME varchar(255),
ADDRESS varchar(255)
);
with this data in it:
ID | NAME | ADDRESS |
---|---|---|
1 | John | NY |
2 | Jake | NJ |
3 | John | NY |
I'd expect to get the below back
ID | NAME | ADDRESS |
---|---|---|
1 | John | NY |
3 | John | NY |
Here's a query that will show you some duplicates based on columns NAME and ADDRESS:
SELECT MIN(ID) as ID, NAME, ADDRESS
FROM Users
GROUP BY NAME, ADDRESS
HAVING COUNT(*) > 1;
This query:
Groups the data by NAME and ADDRESS
Uses the HAVING clause to filter only for groups that have more than one row (the duplicates)
Returns the lowest ID for each duplicate group, along with the NAME and ADDRESS
If you want to see all duplicate rows instead of just the first occurrence, you could use:
SELECT ID, NAME, ADDRESS
FROM Users
WHERE (NAME, ADDRESS) IN (
SELECT NAME, ADDRESS
FROM Users
GROUP BY NAME, ADDRESS
HAVING COUNT(*) > 1
);