sqlscriptingduplicatesperformance

Fastest "Get Duplicates" SQL script


What is an example of a fast SQL to get duplicates in datasets with hundreds of thousands of records. I typically use something like:

SELECT afield1, afield2 FROM afile a 
WHERE 1 < (SELECT count(afield1) FROM afile b WHERE a.afield1 = b.afield1);

But this is quite slow.


Solution

  • This is the more direct way:

    select afield1,count(afield1) from atable 
    group by afield1 having count(afield1) > 1