sqlmysqlpostgresqlduplicate-data

Case insensitive duplicates SQL


So I have a users table where the user.username has many duplicates like:

username and Username and useRnAme
john and John and jOhn

That was a bug and these three records should have been only one.

I'm trying to come up with a SQL query that lists all of these cases ordered by their creation date, so ideally the result should be something like this:

username jan01
useRnAme jan02
Username jan03
john     feb01 
John     feb02
jOhn     feb03

Any suggestions will be much appreciated


Solution

  • Leaving aside the issue of case sensitivity for a moment, the basic strategy is:

     SELECT username, create_date FROM your_table
         WHERE username IN 
         (SELECT username FROM your_table GROUP BY username HAVING COUNT(*) > 1)
     ORDER BY username, create_date
    

    Many RDBMSes (including MySQL assuming that you are using CHAR or VARCHAR for the username column), perform case-insensitive searching by default. For those databases, the above solution will work. To solve the case sensitivity issue for other products , wrap all except the first occurrence of username in the uppercase conversion function specific to your RDBMS:

     SELECT username, create_date FROM your_table
         WHERE UPPER(username) IN 
         (SELECT UPPER(username) FROM your_table GROUP BY UPPER(username) HAVING COUNT(*) > 1)
     ORDER BY username, create_date