I have a simple database table with over 5000 records. Each record says for example 'john' is related to 'jack'. Almost every relationship is likely to be in the database twice because there could be a record 'jack' is related to 'john', which expresses the same relationship. How can I identify and delete these kinds of 'duplicate' records?
To delete such records, you could use exists logic which checks for the presence of the same pair in the reverse order. Assuming arbitrarily that you want to retain the pair with the lexicographically lowest name first, consider:
DELETE
FROM yourTable
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.name2 = yourTable.name1 AND
t2.name1 = yourTable.name2) AND
name1 >= name2;
This assumes that your table has 2 columns name1
and name2
which store the relationships. For example, the following data:
name1 | name2
Jack | Henry
Henry | Jack
Paul | Revere
would become:
name1 | name2
Henry | Jack
Paul | Revere