sqlsql-serverwindow-functionsset-based

Assign unique values in a set-based approach


Simplifying, I have the following data:

Col1 Col2
A X
A Y
A Z
B X
B Y
B Z
C Z

I need to receive the following result:

Col1 Col2
A X
B Y
C Z

In other words: For each value in the left column, I need to assign the minimum UNUSED value from the right column (no duplicates). This was easy to do iteratively, i.e., with cursors. However, I would like something that's a thousand times faster.

What I've tried

Unsurprisingly, select L,min(R) gives the wrong result. I've tried partitioning over several window functions, but I can't get the right combination. I always get the following incorrect result:

Col1 Col2
A X
B X
C Z

I've loaded some of the data into https://dbfiddle.uk/6HbpdlYd.

Here are 142 rows, created from 139 distinct L values, and 139 distinct R values.

Since the input data is produced by a join, there is always exactly one correct solution.


Solution

  • I would use this query:

    SELECT l.Col1, r.Col2
    FROM (
        SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS rn
        FROM MyTable
        GROUP BY Col1
    ) l
    JOIN (
        SELECT Col2, ROW_NUMBER() OVER (ORDER BY Col2) AS rn
        FROM MyTable
        GROUP BY Col2
    ) r
    ON l.rn = r.rn
    ORDER BY l.Col1;
    

    Explanation:

    1. Each subquery builds a distinct, ordered list of values (Col1 or Col2) and numbers them with ROW_NUMBER().

    2. Joining on the row number pairs the first Col1 with the first Col2, the second with the second, etc.

    3. I would do it this way because:

      • ROW_NUMBER() OVER (PARTITION BY Col1 …) only picks one Col2 per Col1, not a global "zip".

      • Using DISTINCT and ROW_NUMBER() in the same query doesn’t deduplicate correctly, since ROW_NUMBER() is assigned before DISTINCT.

    4. If one list is longer, only as many rows as the shorter list appear; use LEFT JOIN if you want to keep all Col1s.

    Tested with your sample data on this db<>fiddle

    A small example:

    Input:

    Col1 | Col2
    -----+-----
    A    | X
    A    | Y
    B    | Z
    C    | Y
    

    Distinct lists after numbering:

    Col1 list        Col2 list
    ---------        ---------
    A | 1            X | 1
    B | 2            Y | 2
    C | 3            Z | 3
    

    Final result:

    Col1 | Col2
    -----+-----
    A    | X
    B    | Y
    C    | Z