I am trying to retrieve a table of numbers based on a shared value for a provided table of identities. However, I only want distinct values for the related numbers based on a couple conditions. For example:
-- Table of existing data
SELECT *
INTO #data
FROM (
VALUES (1, 05202502)
,(3, 05202503)
,(5, 05202502)
,(6, 05202501)
,(8, 05202501)
,(32, 05202501)
) d([No], [Value])
-- Table of inputs to find related numbers
SELECT *
INTO #input
FROM (
VALUES (1), (3), (5), (6), (8)
) i([Id])
-- Get list of "No" based on matching "Value" for the input "Id"
SELECT i.[Id]
,d2.[No]
FROM #input i
LEFT JOIN #data d1
ON i.[Id] = d1.[No]
LEFT JOIN #data d2
ON d1.[Value] = d2.[Value]
DROP TABLE #input
DROP TABLE #data
The problem is, if the input table contains values that already share the same value in the data table, then the same number will be returned multiple times. For example, the result of the above code is:
Id | No |
---|---|
1 | 1 |
5 | 1 |
3 | 3 |
1 | 5 |
5 | 5 |
6 | 6 |
8 | 6 |
6 | 8 |
8 | 8 |
6 | 32 |
8 | 32 |
If there are duplicates of a number in the "No" column and one of those duplicate rows equals the "Id" column (e.g., the first two rows above), then I want it to return only the row where the "No" column equals the "Id" column.
If there are duplicates of a number in the "No" column and none of the duplicate rows equal the "Id" column (e.g., the last two rows above), then it can return any of the rows, but I still want only one row.
In other words, there should be no duplicates in the "No" column, and it matters which row is kept based on the "Id" column. Based on this, I'd want the following table returned:
Id | No |
---|---|
1 | 1 |
3 | 3 |
5 | 5 |
6 | 6 |
8 | 8 |
6 | 32 |
Note: the last Id could've been 6 or 8.
Any idea how I might do this?
EDIT: Changed the second JOIN
from a RIGHT JOIN
to a LEFT JOIN
. The RIGHT JOIN
yielded too many results, whereas the LEFT JOIN
only returns the values based on the #input
table, which is the desired behavior.
select
-- if Id matched then keep it, otherwise find the smallest one with the same Value
coalesce(
i.Id,
min(d.No) over (partition by d.Value)
) as Id,
d.No
from #data d left outer join #input i
on i.Id = d.No;
Starting with the original query you could have instead grouped and conditionally aggregated. Overall I think it's more confusing though:
select
coalesce(
min(case when i.Id = d2.No then i.Id end),
min(i.Id)
) as Id,
d2.No
from ... -- left join plus right join combo
group by d2.No;
Note: If it is somehow possible to (accidentally?) duplicate any input values then the second query has a minor advantage in not needing de-duplication. The addition of distinct
in the first query would be sufficient to eliminate them in that scenario.