I have two tables: Players
and Games
(below). I need to delete the duplicate player names using a query with a Using query row_number()
function, but I need to leave the records that are in the Game
table. If there are multiple records, then any of them can be left. If there are no records in Games
table, then any of them can be left.
For example:
Players
ID | Name | |
---|---|---|
1 | Oliver | --> should be deleted (because ID=2 exists in Games table) |
2 | Oliver | |
3 | Jack | |
4 | Jack | --> should be deleted (there is no Jack records in Games table, may leave any of them) |
5 | Jack | --> should be deleted (there is no Jack records in Games table, may leave any of them) |
6 | Harry | |
7 | Harry | --> should be deleted (both records ID=6,7 are in Games table, may can delete either one) |
Games
ID | PlayerID |
---|---|
1 | 2 |
2 | 6 |
3 | 7 |
Result should be:
Players
ID | Name |
---|---|
2 | Oliver |
3 | Jack |
6 | Harry |
The usual pattern for deleting duplicates is to number the rows having the same name and delete all rows having row number > 1.
However, you have the additional criteria that you wish to retain all player rows having games. This can be done by:
Player
rows having any associated Games
rows.Player
rows having the same name, but place those having games at the top of the number sequence.Player
rows that both have row number > 1 and do not have an associated Games
row.WITH PlayersHasGames AS (
SELECT
*,
CASE WHEN EXISTS (SELECT * FROM Games G WHERE G.PlayerID = P.ID)
THEN 1 ELSE 0 END
AS HasGames
FROM Players P
),
NumberedPlayers AS (
SELECT
*,
ROW_NUMBER() OVER(
PARTITION BY Name
ORDER BY HasGames DESC, ID -- Players with games first
) AS RowNum
FROM PlayersHasGames
)
DELETE
FROM NumberedPlayers
WHERE RowNum > 1
AND HasGames = 0
Results:
ID | Name |
---|---|
2 | Oliver |
3 | Jack |
6 | Harry |
7 | Harry |
See this db<>fiddle for a demo.
If you wish to delete one of the "Harry" rows, even though both have associated Games
rows, you can exclude the AND HasGames = 0
condition. Note that this would violate referential integrity, as one of the Games
rows would be an orphan. See this db<>fiddle.
Renumbering the orphan rows would require an extra step to reassign PlayerId
values after numbering the Players, but before deleting the duplicates.
DECLARE @PlayerMapping TABLE (
FromPlayerID INT,
ToPlayerID INT
)
;WITH PlayersHasGames AS (
SELECT
*,
CASE WHEN EXISTS (SELECT * FROM Games G WHERE G.PlayerID = P.ID)
THEN 1 ELSE 0 END
AS HasGames
FROM Players P
),
NumberedPlayers AS (
SELECT
*,
ROW_NUMBER() OVER(
PARTITION BY Name
ORDER BY HasGames DESC, ID -- Players with games first
) AS RowNum
FROM PlayersHasGames
)
INSERT INTO @PlayerMapping
SELECT FromPlayer.ID AS FromPlayerID, ToPlayer.ID AS ToPlayerID
FROM NumberedPlayers FromPlayer
JOIN NumberedPlayers ToPlayer
ON ToPlayer.Name = FromPlayer.Name
AND ToPlayer.RowNum = 1 -- Keeper player
WHERE FromPlayer.RowNum > 1 -- Any player to be deleted (and mapped)
UPDATE G
SET PlayerId = PM.ToPlayerID
FROM Games G
JOIN @PlayerMapping PM ON PM.FromPlayerID = G.PlayerID
DELETE P
FROM Players P
JOIN @PlayerMapping PM ON PM.FromPlayerID = P.ID
Resulting Player Mapping:
FromPlayerID | ToPlayerID |
---|---|
1 | 2 |
4 | 3 |
5 | 3 |
7 | 6 |
Updated Players:
ID | Name |
---|---|
2 | Oliver |
3 | Jack |
6 | Harry |
Updated Games (showing player name lookup):
ID | PlayerID | PlayerName |
---|---|---|
1 | 2 | Oliver |
2 | 6 | Harry |
3 | 6 | Harry |
See this db<>fiddle