sql

SQL query to find duplicate rows based on multiple columns


I need help building an SQL query to identify duplicate rows based on multiple columns.

For example, if I have this table:

CREATE TABLE Users 
(
    ID int,
    NAME varchar(255),
    ADDRESS varchar(255)
);

with this data in it:

ID NAME ADDRESS
1 John NY
2 Jake NJ
3 John NY

I'd expect to get the below back

ID NAME ADDRESS
1 John NY
3 John NY

Solution

  • Here's a query that will show you some duplicates based on columns NAME and ADDRESS:

    SELECT MIN(ID) as ID, NAME, ADDRESS
    FROM Users
    GROUP BY NAME, ADDRESS
    HAVING COUNT(*) > 1;
    

    This query:

    1. Groups the data by NAME and ADDRESS

    2. Uses the HAVING clause to filter only for groups that have more than one row (the duplicates)

    3. Returns the lowest ID for each duplicate group, along with the NAME and ADDRESS

    If you want to see all duplicate rows instead of just the first occurrence, you could use:

    SELECT ID, NAME, ADDRESS
    FROM Users
    WHERE (NAME, ADDRESS) IN (
        SELECT NAME, ADDRESS
        FROM Users
        GROUP BY NAME, ADDRESS
        HAVING COUNT(*) > 1
    );