Single sentence to explain the scenario. I work for a pharmacy chain, our patient management software supplier allows the creation of duplicate patients without check, I need to find them and pass to each branch to remove.
I need to get a list of all patients in a branch that is a duplicate and doesn't appear in a separate table which shows they have been given a script. i.e. if they exist more than once and never got a script they are an empty patient record and need deleted.
Simplified tables
Patient
PatientId | CardId | Surname | Forenames | DoB | Postcode |
---|---|---|---|---|---|
123 | xyz | Doe | John | 1970-01-01 | AB312EX |
234 | abc | Smith | Alan | 1950-02-01 | AB311AB |
456 | fgh | Doe | John | NULL | NULL |
789 | uvw | Doe | John | NULL | NULL |
098 | qwe | Jones | Tom | NULL | NULL |
Script
ScriptId | PatientId | DateSupplied |
---|---|---|
1 | 234 | 2025-03-20 |
2 | 123 | 2025-03-21 |
3 | 098 | 2025-03-24 |
As you see the branch can provide as much patient info as they choose, the patientId and CardId is created by the system, NULLs in columns such as DoB or Postcode, won't prevent a script from being generated.
In that example, I would need only John Doe to appear for PatientId 456 and 789 as they are identical across the last 4 columns and their patientId doesn't appear in the script table.
This is what I thought would be the first part, and it seems to show correctly all the patients appearing more than once
WITH cte AS (
SELECT p.Surname, p.ForeNames, p.DateOfBirth, p.PostCode, COUNT(*) AS [Records]
FROM pmr.Patient p
WHERE p.Branch = 9
GROUP BY p.Surname, p.ForeNames, p.DateOfBirth, p.PostCode
HAVING COUNT(*) > 1
)
SELECT * FROM cte ORDER BY Surname, Forenames
After this I add the details I need in the select and join back to the cte. I need the CardId to allow the branch to find the correct record and I need the patientId to link to the Script table but can't use them in a group by count as it is unique for each patient record.
WITH cte AS (
SELECT p.Surname, p.ForeNames, p.DateOfBirth, p.PostCode, COUNT(*) AS [Records]
FROM pmr.Patient p
WHERE p.Branch = 9
GROUP BY p.Surname, p.ForeNames, p.DateOfBirth, p.PostCode
HAVING COUNT(*) > 1
)
SELECT
p.CardId, p.Surname, p.ForeNames, p.DateOfBirth, p.PostCode
FROM pmr.Patient p
INNER JOIN cte ON
cte.Surname = p.Surname AND
cte.Forenames = p.ForeNames AND
cte.DateOfBirth = p.DateOfBirth AND
cte.Postcode = p.PostCode
ORDER BY p.Surname, p.Forenames
This however returns some rows from the first query but not all of them.
Can someone help point me in the right direction and advise how I would then only return the duplicate if it's patientId isn't in the script table.
Apologies, I wasn't clear so I created a debate around data validation, as I say my hands are tied to the branch system we are supplied, and being NHS, the options are limited.
Thanks to the suggestion from @ThomA, I used a windowed function Count to get what I was looking for, as mentioned from Alan, the second part was a simple join with a null check so here is the final query to obtain the list.
WITH cte AS (
SELECT PatientId, CardId, Surname, Forenames, DateOfBirth, PostCode, Branch,
COUNT(CardId) OVER(PARTITION BY Surname, ForeNames, DateOfBirth, PostCode) as [records]
FROM pmr.Patient
WHERE Branch = 9
)
SELECT cte.* FROM cte
LEFT JOIN pmr.[Session] s ON cte.PatientId = s.Patient AND cte.Branch = s.Branch
WHERE records > 1 AND s.Patient IS NULL
ORDER BY Surname, Forenames