I'm looking for a (T-)SQL solution to the following question. I have a table with columns
Values in the Key
column may be present in multiple rows. As a result set I need a row for each Key
with values in the Column1
-Column3
taken from potentially different rows in the initial data, where ColumnX
value is not null and ordered by SortOrder
. Hopefully it's easier to understand this from the example:
Key | Column1 | Column2 | Column3 | SortOrder |
---|---|---|---|---|
Key1 | C1 | NULL | NULL | 1 |
Key1 | C10 | C11 | NULL | 2 |
Key1 | C20 | C21 | C22 | 3 |
Key2 | NULL | C30 | NULL | 1 |
Key2 | C40 | C41 | NULL | 2 |
The result set in this case would be
Key | Column1 | Column2 | Column3 |
---|---|---|---|
Key1 | C1 | C11 | C22 |
Key2 | C40 | C30 | NULL |
There are 100K-200K rows in the table so I'm looking for a good performing solution. Ideally one which reads the table only once but I'm not sure it's even possible (I'll gladly take analytical functions if they help performance or readability in this case). Suggestions for indexing are also welcome though I assume it will be by (Key, SortOrder) anyway.
I have a query which returns the desired result but I'm not quite happy with readability and especially performance:
WITH Keys AS (SELECT Key FROM MyTable GROUP BY Key)
SELECT
Key,
(SELECT TOP(1) A.Column1 FROM MyTable A WHERE A.Column1 IS NOT NULL AND A.Key=Keys.Key ORDER BY SortOrder) AS Column1,
(SELECT TOP(1) A.Column2 FROM MyTable A WHERE A.Column2 IS NOT NULL AND A.Key=Keys.Key ORDER BY SortOrder) AS Column2,
(SELECT TOP(1) A.Column3 FROM MyTable A WHERE A.Column3 IS NOT NULL AND A.Key=Keys.Key ORDER BY SortOrder) AS Column3
FROM Keys
You can use the new IGNORE NULLS option when doing FIRST_VALUE to get that you're looking for.
More old-school alternative is to use first_value(column1) OVER (partition BY [key] ORDER BY case when column1 is null then 1 else 0 end, sortorder)
which manually puts the NULLs last.
Finally, a trick i like is to use aggregate together with padding to generate a combined sort + value column, after which the sort part is removed to leave the Value. This method has the upside that it doesn't require any window functions and is just a simple aggregate.
SELECT [key]
, max(c1) AS c1
, max(c2) AS c2
, max(c3) AS c3
, MAX(c1_old) AS c1_old
, MAX(c2_old) AS c2_old
, MAX(c3_old) AS c3_old
, STUFF(MIN(RIGHT(REPLICATE('0', 10) + CAST(SortOrder AS VARCHAR(10)), 10) + column1), 1, 10, '') AS c1_agg
, STUFF(MIN(RIGHT(REPLICATE('0', 10) + CAST(SortOrder AS VARCHAR(10)), 10) + column2), 1, 10, '') AS c2_agg
, STUFF(MIN(RIGHT(REPLICATE('0', 10) + CAST(SortOrder AS VARCHAR(10)), 10) + column3), 1, 10, '') AS c3_agg
FROM (
SELECT *
, first_value(column1) ignore nulls OVER(partition BY [key] ORDER BY sortorder rows between unbounded preceding and current row) c1
, first_value(column2) ignore nulls OVER(partition BY [key] ORDER BY sortorder rows between unbounded preceding and current row) c2
, first_value(column3) ignore nulls OVER(partition BY [key] ORDER BY sortorder rows between unbounded preceding and current row) c3
, FIRST_VALUE(column1) OVER(partition BY [key] ORDER BY case when column1 is null then 1 else 0 end, sortorder rows between unbounded preceding and current row) AS c1_old
, FIRST_VALUE(column2) OVER(partition BY [key] ORDER BY case when column2 is null then 1 else 0 end, sortorder rows between unbounded preceding and current row) AS c2_old
, FIRST_VALUE(column3) OVER(partition BY [key] ORDER BY case when column3 is null then 1 else 0 end, sortorder rows between unbounded preceding and current row) AS c3_old
FROM
(
VALUES (N'Key1', N'C1', NULL, NULL, 1)
, (N'Key1', N'C10', N'C11', NULL, 2)
, (N'Key1', N'C20', N'C21', N'C22', 3)
, (N'Key2', NULL, N'C30', NULL, 1)
, (N'Key2', N'C40', N'C41', NULL, 2)
) t ([Key],Column1,Column2,Column3,SortOrder)
) x
GROUP BY [key]
Output:
key | c1 | c2 | c3 | c1_old | c2_old | c3_old | c1_agg | c2_agg | c3_agg |
---|---|---|---|---|---|---|---|---|---|
Key1 | C1 | C11 | C22 | C1 | C11 | C22 | C1 | C11 | C22 |
Key2 | C40 | C30 | NULL | C40 | C30 | NULL | C40 | C30 | NULL |