[SOLVED] sql database delete duplicate relationships

sql database delete duplicate relationships

I have a simple database table with over 5000 records. Each record says for example 'john' is related to 'jack'. Almost every relationship is likely to be in the database twice because there could be a record 'jack' is related to 'john', which expresses the same relationship. How can I identify and delete these kinds of 'duplicate' records?

Solution

To delete such records, you could use exists logic which checks for the presence of the same pair in the reverse order. Assuming arbitrarily that you want to retain the pair with the lexicographically lowest name first, consider:

DELETE
FROM yourTable
WHERE EXISTS (SELECT 1 FROM yourTable t2
              WHERE t2.name2 = yourTable.name1 AND
                    t2.name1 = yourTable.name2) AND
      name1 >= name2;

This assumes that your table has 2 columns name1 and name2 which store the relationships. For example, the following data:

name1 | name2
Jack  | Henry
Henry | Jack
Paul  | Revere

would become:

name1 | name2
Henry | Jack
Paul  | Revere