sqlaggregate-functionsdistinctdense-rank

Use Dense_Rank() to Calculate 2 Different Distinct Counts


I need to calculate two different distinct counts in the same query, based on different criteria. I'd like to accomplish this with dense_rank() if possible.

I need columns for:

  1. By resource, how many unique customers are in category X?
  2. By resource, how many unique customers are in category Y?

Sample data:

line res cat customer_id
1 A X 1
2 A X 1
3 A Y 1
4 A X 2
5 B X 1
6 B Y 1

Expected result:

RESOURCE UNIQUE CUSTOMERS CATEGORY X UNIQUE CUSTOMERS CATEGORY Y
A 2 1
B 1 1
DROP TABLE IF EXISTS #test_data 

CREATE TABLE #test_data
(
line int 
,res varchar(1)
,cat varchar(1)
,customer_id int
)

INSERT INTO #test_data (line,res,cat,customer_id) VALUES (1,'A','X',1)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (2,'A','X',1)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (3,'A','Y',1)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (4,'A','X',2)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (5,'B','X',1)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (6,'B','Y',1)

SELECT  DISTINCT 
    RESOURCE = t.res
    ,[UNIQUE CUSTOMERS CATEGORY X] = CASE WHEN t.cat <> 'X' THEN NULL ELSE DENSE_RANK() OVER (PARTITION BY CASE WHEN t.cat <> 'X' THEN 0 ELSE 1 END, t.res ORDER BY t.customer_id ASC) END  +
                                        CASE WHEN t.cat <> 'X' THEN NULL ELSE DENSE_RANK() OVER (PARTITION BY CASE WHEN t.cat <> 'X' THEN 0 ELSE 1 END, t.res ORDER BY t.customer_id DESC) - 1 END
    ,[UNIQUE CUSTOMERS CATEGORY Y] = CASE WHEN t.cat <> 'Y' THEN NULL ELSE DENSE_RANK() OVER (PARTITION BY CASE WHEN t.cat <> 'Y' THEN 0 ELSE 1 END, t.res ORDER BY t.customer_id ASC) END  +
                                        CASE WHEN t.cat <> 'Y' THEN NULL ELSE DENSE_RANK() OVER (PARTITION BY CASE WHEN t.cat <> 'Y' THEN 0 ELSE 1 END, t.res ORDER BY t.customer_id DESC) - 1 END
FROM
    #test_data t


Incorrect results (I don't want the NULL values and I want two lines instead of four):

RESOURCE UNIQUE CUSTOMERS CATEGORY X UNIQUE CUSTOMERS CATEGORY Y
A NULL 1
A 2 NULL
B NULL 1
B 1 NULL

Solution

  • You seem to be looking for conditional counts:

    select resource,
        count(distinct case when cat = 'X' then customer_id end) cnt_cust_in_x,
        count(distinct case when cat = 'Y' then customer_id end) cnt_cust_in_y
    from #test_data
    group by resource
    

    I don’t see how dense_rank would be helpful here.

    In databases that support the standard filter clause of aggregate functions (such as Postgres):

    select resource,
        count(distinct customer_id) filter(where cat = 'X') cnt_cust_in_x,
        count(distinct customer_id) filter(where cat = 'Y') cnt_cust_in_y
    from #test_data
    group by resource