I need to calculate two different distinct counts in the same query, based on different criteria. I'd like to accomplish this with dense_rank() if possible.
I need columns for:
Sample data:
line | res | cat | customer_id |
---|---|---|---|
1 | A | X | 1 |
2 | A | X | 1 |
3 | A | Y | 1 |
4 | A | X | 2 |
5 | B | X | 1 |
6 | B | Y | 1 |
Expected result:
RESOURCE | UNIQUE CUSTOMERS CATEGORY X | UNIQUE CUSTOMERS CATEGORY Y |
---|---|---|
A | 2 | 1 |
B | 1 | 1 |
DROP TABLE IF EXISTS #test_data
CREATE TABLE #test_data
(
line int
,res varchar(1)
,cat varchar(1)
,customer_id int
)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (1,'A','X',1)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (2,'A','X',1)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (3,'A','Y',1)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (4,'A','X',2)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (5,'B','X',1)
INSERT INTO #test_data (line,res,cat,customer_id) VALUES (6,'B','Y',1)
SELECT DISTINCT
RESOURCE = t.res
,[UNIQUE CUSTOMERS CATEGORY X] = CASE WHEN t.cat <> 'X' THEN NULL ELSE DENSE_RANK() OVER (PARTITION BY CASE WHEN t.cat <> 'X' THEN 0 ELSE 1 END, t.res ORDER BY t.customer_id ASC) END +
CASE WHEN t.cat <> 'X' THEN NULL ELSE DENSE_RANK() OVER (PARTITION BY CASE WHEN t.cat <> 'X' THEN 0 ELSE 1 END, t.res ORDER BY t.customer_id DESC) - 1 END
,[UNIQUE CUSTOMERS CATEGORY Y] = CASE WHEN t.cat <> 'Y' THEN NULL ELSE DENSE_RANK() OVER (PARTITION BY CASE WHEN t.cat <> 'Y' THEN 0 ELSE 1 END, t.res ORDER BY t.customer_id ASC) END +
CASE WHEN t.cat <> 'Y' THEN NULL ELSE DENSE_RANK() OVER (PARTITION BY CASE WHEN t.cat <> 'Y' THEN 0 ELSE 1 END, t.res ORDER BY t.customer_id DESC) - 1 END
FROM
#test_data t
Incorrect results (I don't want the NULL values and I want two lines instead of four):
RESOURCE | UNIQUE CUSTOMERS CATEGORY X | UNIQUE CUSTOMERS CATEGORY Y |
---|---|---|
A | NULL | 1 |
A | 2 | NULL |
B | NULL | 1 |
B | 1 | NULL |
You seem to be looking for conditional counts:
select resource,
count(distinct case when cat = 'X' then customer_id end) cnt_cust_in_x,
count(distinct case when cat = 'Y' then customer_id end) cnt_cust_in_y
from #test_data
group by resource
I don’t see how dense_rank
would be helpful here.
In databases that support the standard filter
clause of aggregate functions (such as Postgres):
select resource,
count(distinct customer_id) filter(where cat = 'X') cnt_cust_in_x,
count(distinct customer_id) filter(where cat = 'Y') cnt_cust_in_y
from #test_data
group by resource