I'm not sure if this can be done in one stored procedure. I got it working using nested views but I want it to be as concise as possible.
Every time I try to select from a subquery, I get errors that I can't debug. I understand the concepts of how to get an output but struggle with the syntax.
Suppose we have a large table listing every household in the United States in the following format:
[dbo].[US_HOUSEHOLDS_TABLE]
HOUSEHOLD_ID | FAMILY_NAME | MEMBER_NAME |
---|---|---|
1 | Anderson | Robert |
1 | Anderson | Mary |
1 | Anderson | Cody |
1 | Anderson | Brittany |
2 | Carver | William |
2 | Carver | Suzanne |
3 | Washington | Fred |
3 | Washington | Ethel |
3 | Washington | Tony |
4 | Smith | Darlene |
4 | Smith | Amy |
4 | Smith | Richard |
Desired output:
FAMILY_SIZE | FREQUENCY |
---|---|
1 | 23,903,124 |
2 | 57,116,548 |
3 | 38,617,295 |
4 | 12,397,106 |
5 | 3,108,284 |
6 or more | 1,393,571 |
You can do something like this perhaps:
SELECT *
INTO #data
FROM (
VALUES (1, N'Anderson', N'Robert')
, (1, N'Anderson', N'Mary')
, (1, N'Anderson', N'Cody')
, (1, N'Anderson', N'Brittany')
, (2, N'Carver', N'William')
, (2, N'Carver', N'Suzanne')
, (3, N'Washington', N'Fred')
, (3, N'Washington', N'Ethel')
, (3, N'Washington', N'Tony')
, (4, N'Smith', N'Darlene')
, (4, N'Smith', N'Amy')
, (4, N'Smith', N'Richard')
, (4, N'Smith', N'Darlene')
, (4, N'Smith', N'Amy')
, (4, N'Smith', N'Richard')
, (5, 'Sigge', 'Mannen')
, (5, 'Stack', 'Overflow')
) t (HOUSEHOLD_ID,FAMILY_NAME,MEMBER_NAME)
SELECT CASE WHEN cnt < 6 THEN CAST(cnt AS VARCHAR(30)) ELSE '6 or more' END AS HouseHold
, COUNT(*) AS Total
FROM (
SELECT count(*) AS cnt
FROM #data
GROUP BY HOUSEHOLD_ID
) x
GROUP BY CASE WHEN cnt < 6 THEN CAST(cnt AS VARCHAR(30)) ELSE '6 or more' END
First, you get the count by the household and then you group by that count.
Those with 6+ get their own group.
Outputs:
HouseHold | Total |
---|---|
2 | 2 |
3 | 1 |
4 | 1 |
6 or more | 1 |