I am able to get the correct results using UNION ALL - but have been trying to get this to work using GROUPING SETS without success. Is this possible?
Example SQL:
--Build data table
WITH TABLE_1 AS ( --data
SELECT 1 T1_ID, 2 VAL UNION
SELECT 2 T1_ID, 4 VAL UNION
SELECT 3 T1_ID, 6 VAL UNION
SELECT 4 T1_ID, 8 VAL UNION
SELECT 5 T1_ID, 10 VAL
),
TABLE_2 AS ( --first level join
SELECT 1 T1_ID, 'AA' T2_ID UNION
SELECT 2 T1_ID, 'AA' T2_ID UNION
SELECT 3 T1_ID, 'BB' T2_ID UNION
SELECT 4 T1_ID, 'BB' T2_ID UNION
SELECT 5 T1_ID, 'BB' T2_ID
),
TABLE_3 AS ( --second level join
SELECT 1 T1_ID, 'CCC' T3_ID UNION
SELECT 2 T1_ID, 'CCC' T3_ID UNION
SELECT 3 T1_ID, 'CCC' T3_ID UNION
SELECT 4 T1_ID, 'CCC' T3_ID UNION
SELECT 5 T1_ID, 'CCC' T3_ID UNION
SELECT 1 T1_ID, 'DDD' T3_ID UNION
SELECT 2 T1_ID, 'DDD' T3_ID UNION
SELECT 3 T1_ID, 'DDD' T3_ID UNION
SELECT 4 T1_ID, 'DDD' T3_ID UNION
SELECT 5 T1_ID, 'DDD' T3_ID
)
This is what the data table looks like fully joined -
SELECT
TABLE_1.T1_ID,
TABLE_1.VAL,
TABLE_2.T2_ID,
TABLE_3.T3_ID
FROM TABLE_1
LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
T1_ID VAL T2_ID T3_ID
1 2 AA CCC
1 2 AA DDD
2 4 AA CCC
2 4 AA DDD
3 6 BB CCC
3 6 BB DDD
4 8 BB CCC
4 8 BB DDD
5 10 BB CCC
5 10 BB DDD
--This gives me the results I need (Sum VAL by distinct AA, BB, AA+BB, nulls) -
SELECT
T2_ID,
NULL T3_ID,
SUM(VAL) TOTAL
FROM TABLE_1
LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
GROUP BY T2_ID
UNION ALL
SELECT
NULL T2_ID,
T3_ID,
SUM(VAL) TOTAL
FROM TABLE_1
LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
GROUP BY T3_ID
UNION ALL
SELECT
T2_ID,
T3_ID,
SUM(VAL) TOTAL
FROM TABLE_1
LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
GROUP BY T2_ID, T3_ID
UNION ALL
SELECT
NULL T2_ID,
NULL T3_ID,
SUM(VAL) TOTAL
FROM TABLE_1
CORRECT RESULTS:
T2_ID T3_ID TOTAL
AA NULL 6
BB NULL 24
NULL CCC 30
NULL DDD 30
AA CCC 6
BB CCC 24
AA DDD 6
BB DDD 24
NULL NULL 30
When I try using GROUPING SETS -
SELECT
GROUPING_ID (T2_ID, T3_ID) GRP_ID,
T2_ID,
T3_ID,
SUM(VAL) TOTAL
FROM TABLE_1
LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
GROUP BY
GROUPING SETS ((),
T2_ID,
T3_ID,
(T2_ID,T3_ID))
My results are doubling where GRP_ID = 1 or 3 shown below
-----------------------------------------
GRP_ID T2_ID T3_ID TOTAL
0 AA CCC 6
0 BB CCC 24
2 NULL CCC 30
0 AA DDD 6
0 BB DDD 24
2 NULL DDD 30
1 AA NULL 12
1 BB NULL 48
3 NULL NULL 60
UNION ALL approach works - but I keep thinking there should be a way to use GROUPING SETS.
EDIT: I need to also add the reason why I'm looking for a grouping set solution - the real-world version of this problem is not limited to two joins used to drive the group conditions (this can come from many joins), and any of them can end up duplicating rows like the TABLE_3 in the example. With this, the real-world grouping would look more like this, for example:
grouping sets ((),
(a, b)
(a, c)
(a, d)
(a, b, c)
(a, b, c, d)
a,
b,
c,
d)
CORRECT ANSWER - this works for N grouping levels and combinations -
The "SUB" CTE is needed to assign TABLE_1 row value for each individual grouping level, so that the summarizations are correct for each group level, which is done in the main/final query.
But there is a performance hit after all; with my actual data set, the UNION ALL was completing in 17 seconds - the GROUPING SETS option in 33 seconds.
SUB AS (
SELECT
GROUPING_ID (TABLE_2.T2_ID, TABLE_3.T3_ID) GROUP_ID,
TABLE_1.T1_ID,
TABLE_2.T2_ID,
TABLE_3.T3_ID,
MAX(TABLE_1.VAL) VAL
FROM TABLE_1
LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
GROUP BY
GROUPING SETS((),
TABLE_2.T2_ID,
TABLE_3.T3_ID,
(TABLE_2.T2_ID, TABLE_3.T3_ID)
),
TABLE_1.T1_ID
)
--FINAL
SELECT
GROUP_ID,
T2_ID,
T3_ID,
SUM(VAL) TOTAL_VAL
FROM SUB
GROUP BY
GROUP_ID,
T2_ID,
T3_ID
(OLD ANSWER) - gets the same result as example table, BUT when other joins multiply the rows, it is not correct.
SELECT
GRP_ID,
T2_ID,
T3_ID,
SUM(TOTAL)
FROM (
SELECT DISTINCT
GROUPING_ID (T2_ID, T3_ID) GRP_ID,
SEMI_FINAL.T2_ID,
SEMI_FINAL.T3_ID,
SUM(VAL) TOTAL
FROM (
SELECT TABLE_1.*, T2_ID, T3_ID,
RANK() OVER (ORDER BY TABLE_2.T2_ID) RANK2,
RANK() OVER (ORDER BY TABLE_3.T3_ID) RANK3
FROM TABLE_1
LEFT JOIN TABLE_2 ON TABLE_2.T1_ID = TABLE_1.T1_ID
LEFT JOIN TABLE_3 ON TABLE_3.T1_ID = TABLE_1.T1_ID
) SEMI_FINAL
GROUP BY
GROUPING SETS ((),
SEMI_FINAL.T3_ID,
SEMI_FINAL.T2_ID,
(SEMI_FINAL.T2_ID, SEMI_FINAL.T3_ID))
, RANK2
, RANK3
) FINAL_TABLE
GROUP BY
GRP_ID,
T3_ID,
T2_ID
And this returns the correct results -
GRP_ID T2_ID T3_ID TOTAL
0 AA CCC 6
0 BB CCC 24
0 AA DDD 6
0 BB DDD 24
1 AA NULL 6
1 BB NULL 24
2 NULL CCC 30
2 NULL DDD 30
3 NULL NULL 30