Simplifying, I have the following data:
Col1 | Col2 |
---|---|
A | X |
A | Y |
A | Z |
B | X |
B | Y |
B | Z |
C | Z |
I need to receive the following result:
Col1 | Col2 |
---|---|
A | X |
B | Y |
C | Z |
In other words: For each value in the left column, I need to assign the minimum UNUSED value from the right column (no duplicates). This was easy to do iteratively, i.e., with cursors. However, I would like something that's a thousand times faster.
What I've tried
Unsurprisingly, select L,min(R)
gives the wrong result. I've tried partitioning over several window functions, but I can't get the right combination. I always get the following incorrect result:
Col1 | Col2 |
---|---|
A | X |
B | X |
C | Z |
I've loaded some of the data into https://dbfiddle.uk/6HbpdlYd.
Here are 142 rows, created from 139 distinct L values, and 139 distinct R values.
Since the input data is produced by a join, there is always exactly one correct solution.
I would use this query:
SELECT l.Col1, r.Col2
FROM (
SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS rn
FROM MyTable
GROUP BY Col1
) l
JOIN (
SELECT Col2, ROW_NUMBER() OVER (ORDER BY Col2) AS rn
FROM MyTable
GROUP BY Col2
) r
ON l.rn = r.rn
ORDER BY l.Col1;
Explanation:
Each subquery builds a distinct, ordered list of values (Col1
or Col2
) and numbers them with ROW_NUMBER()
.
Joining on the row number pairs the first Col1
with the first Col2
, the second with the second, etc.
I would do it this way because:
ROW_NUMBER() OVER (PARTITION BY Col1 …)
only picks one Col2 per Col1, not a global "zip".
Using DISTINCT
and ROW_NUMBER()
in the same query doesn’t deduplicate correctly, since ROW_NUMBER()
is assigned before DISTINCT
.
If one list is longer, only as many rows as the shorter list appear; use LEFT JOIN
if you want to keep all Col1
s.
Tested with your sample data on this db<>fiddle
A small example:
Input:
Col1 | Col2
-----+-----
A | X
A | Y
B | Z
C | Y
Distinct lists after numbering:
Col1 list Col2 list
--------- ---------
A | 1 X | 1
B | 2 Y | 2
C | 3 Z | 3
Final result:
Col1 | Col2
-----+-----
A | X
B | Y
C | Z