sqlgroup-byduplicatesnotin

SQL: Deleting Duplicates using Not in and Group by


I have the following SQL Syntax to delete duplicate rows, but never are any rows affected.

DELETE FROM content_stacks WHERE id NOT IN (
SELECT id 
FROM content_stacks
GROUP BY user_id, content_id
);

The subquery itself is returning the id list of first entries correctly.

SELECT id 
FROM content_stacks
GROUP BY user_id, content_id

When I'm inserting the results list as a string it is working, too:

DELETE FROM content_stacks WHERE id NOT IN (239,231,217,218,219,232,233,220,230,226,234,235,224,225,221,223,222,227,228,229,236,237,238,216,208,209,210,204,211,212,242,203,240,201,241,205,206,207,213,214,215);

I checked many similar examples and this should be working in my opinion. What am I missing?


Solution

  • The code from Meyssam Toluie is not working as it is but I made a similar solution with the same idea with rownumbers:

    DELETE FROM content_stacks WHERE id IN
    (SELECT id FROM (
          SELECT id, ROW_NUMBER() OVER(PARTITION BY user_id, content_id)row_num
          FROM content_stacks
        ) sub
    WHERE row_num > 1)
    

    This is working for me now.

    My first command did not work because: The group by command does not show all ids in the output, but they are still there, so in fact all ids were returned in the NOT IN id-list. The row number seems to be the easiest way for this problem.