t-sqldense-rankwindowed

T-SQL Dense_Rank() increments for nulls in group


Good day, I am having trouble with my SQL query. I have a dataset that is representing nested objects ranging in depth from 3 items to 8 items. I am using dense_rank() over (Partition by [previous rank] to identify each item uniquely, but because of the nested nature of the data dense rank is not behaving the way I want.

Column A Column B Column C Column D Column E Column F Column G Column H
System Electric SubSystem DistributionSystem1 NULL NULL NULL NULL
System Electric SubSystem DistributionSystem2 NULL NULL NULL NULL
System Electric SubSystem Generators GeneratorCategory DieselGenerator BackupGenerator DieselGenerator1
System Electric SubSystem Load Center MCC MCC1 Cubicle Cubicle1
System Electric SubSystem Load Center MCC MCC1 Cubicle Cubicle2
System Electric SubSystem Load Center MCC MCC2 Cubicle Cubicle1
System Electric SubSystem Load Center1 MCC MCC2 Breaker Breaker 1
System Electric SubSystem Substation SubStationCategory SubStation1 Transformer Transformer1
System Cleaning SubSystem Cleaning Supply System Pump Pump 1 Cleaning Supply NULL NULL
System Cleaning SubSystem Cleaning Supply System Pump Pump 1 Cleaning Supply Motor Motor Pump 1 Cleaning Supply
System Cleaning SubSystem Cleaning Supply System Pump Pump 2 Cleaning Supply NULL NULL
System Cleaning SubSystem Cleaning Supply System Pump Pump 2 Cleaning Supply Motor Motor Pump 2 Cleaning Supply
System Cleaning SubSystem Cleaning Supply System Structure Cleaning Supply Pump Station Meter Flow Meter 1
System Cleaning SubSystem Cleaning Waste System Pump Pump 1 Cleaning Waste NULL NULL
System Cleaning SubSystem Cleaning Waste System Pump Pump 1 Cleaning Waste Meter Level Indicator Cleaning Waste
System Cleaning SubSystem Cleaning Waste System Pump Pump 1 Cleaning Waste Motor Motor Pump 1 Cleaning Waste

I have tried many variations of the dense_rank() function

,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A]) AS [Column A ID]

Works fine, but since the value is always "system" that isn't all that confusing.

,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A],[Column B]) AS [Column B ID]

Also works But

,   case when [Column E] is null then NULL else DENSE_RANK() over (Partition by case when [Column E] is null then 0 end, [Column D] ORDER BY [Column E], [Column D], [Column C], [Column B], [Column A]) END AS [Column E ID]

Is where I run into problems.

Select
    [Column A]
,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A]) AS [Column A ID]
,   [Column B]
,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A],[Column B]) AS [Column B ID]
,   [Column C]
,   DENSE_RANK() over (Partition by [Column B] ORDER BY [Column A],[Column B], [Column C]) AS [Column C ID]
,   [Column D]
,   DENSE_RANK() over (Partition by [Column B]  ORDER BY [Column A],[Column B], [Column C], [Column D]) AS [Column D ID]
,   [Column E]
,   case when [Column E] is null then NULL else DENSE_RANK() over (Partition by [Column C],[Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E]) END AS [Column E ID]
,   [Column F]
,   case when [Column F] is null then NULL else DENSE_RANK() over (Partition by [Column E], [Column D], [Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F]) END AS [Column F ID]
,   [Column G]
,   case when [Column G] is null then NULL else DENSE_RANK() over (Partition by case when [Column F] is null then 0 end, [Column F], [Column C] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F], [Column G]) END AS [Column G ID]
,   [Column H]
,   case when [Column H] is null then NULL else DENSE_RANK() over (Partition by [Column G], [Column F], [Column E], [Column D],[Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F], [Column G], [Column H]) END AS [Column H ID]
From #tempDummyData
Order by [Column A], [Column B], [Column C], [Column D], [Column E], [Column F], [Column G], [Column H]

ResultsFixed

The Picture shows the results, and I've annotated in red the expected results. The idea is to be able to concatenate the numbers together to create a unique ID that tells what rank, and position each item holds. so row 1 would be 1 While row 4 would be 1.2.1.4.1.1.1.1 Is it possible to get Dense_rank() to ignore nulls, but still reset with each group?


Solution

  • In addition to using a CASE expression suggested by Alan to suppress the rank results for null values, I expect you will also want to adjust the remaining values in the column. For DENSE_RANK(), that adjustment would be one if null values were present and zero otherwise.

    No simple straightforward way of doing this comes to mind, but it can be done using another windowed SUM() function that wraps another CASE expression.

    CASE WHEN ColX IS NOT NULL THEN
        DENSE_RANK()
            OVER(PARTITION BY ... ORDER BY ColX)
        - MAX(CASE WHEN ColX IS NULL THEN 1 ELSE 0 END)
            OVER(PARTITION BY ... ORDER BY ColX)
         END AS RankX
    

    A somewhat klugy alternative that eliminates the adjustment, is to use ISNULL() to replace null values with a values that always sorts last and thus doesn't affect the lower DENSE_RANK() values.

    CASE WHEN ColX IS NOT NULL THEN
        DENSE_RANK()
            OVER(PARTITION BY ... ORDER BY ISNULL(ColX, 'ZZZMaxValue'))
         END AS RankX
    

    (ELSE NULL is implied in the above case statements when not specified.)

    Sample results:

    ColA COlB COlC RankA RankB RankC
    null null null null null null
    A null null 1 null null
    A null X 1 null 1
    A null X 1 null 1
    A J null 1 1 null
    A J X 1 1 1
    A J Y 1 1 2
    A J Y 1 1 2
    A J Z 1 1 3
    A K X 1 2 1
    A K Y 1 2 2
    B J null 2 1 null
    B J X 2 1 1
    B J Y 2 1 2
    B K Z 2 2 1

    See this db<>fiddle for a demo containing both forms.