Is there a good way to calculate a median when using GROUP BY GROUPING SETS in SQL server?
I'm trying to generate some analysis for different groupings. All the other aggregations (COUNT/ SUM/ AVG/ MIN/ MAX) work fine with GROUPING SETS.
Using PERCENTILE_CONT to calculate a median fails as [value] 'is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause'
I'm not wedded to PERCENTILE_CONT, but am struggling to think of a solution other than a very long UNION of this whole query repeated 4 times, but just using GROUP BY for one of the grouping sets at a time. Is there a better alternative I'm missing?
SELECT
CASE
WHEN [type] IS NOT NULL AND [period] IS NOT NULL AND [area] IS NOT NULL THEN 'LA by event type by month'
WHEN [type] IS NOT NULL AND [period] IS NULL AND [area] IS NOT NULL THEN 'LA by event type any time'
WHEN [type] IS NOT NULL AND [period] IS NOT NULL AND [area] IS NULL THEN 'England event type by month'
WHEN [type] IS NOT NULL AND [period] IS NULL AND [area] IS NULL THEN 'England event type any time'
ELSE 'CHECK GROUPING' END AS [grouping_type]
,[type]
,[period]
,[area]
,COUNT(*) AS [number_projects]
,SUM([units]) AS [total_units]
,AVG([value]) as [mean_value]
,MIN([value]) AS [min_value]
,MAX([value]) AS [max_value]
/*INCLUDING THIS CAUSES AN ERROR*/
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [value]) OVER() AS median_value
FROM tb_source main
GROUP BY
GROUPING SETS (
([type])
,([type], [period])
,([type], [area])
,([type], [period], [area])
,())
PERCENTILE_CONT
is a window function, which means you need to put it into a derived table, you can't use it at the aggregation level unless you are only taking the median after aggregation.
@AaronBertrand has a long article explaining all the different median options. But PERCENTILE_CONT
is the simplest, especially in your case, although it's probably not very efficient.
SELECT
CASE WHEN type IS NOT NULL THEN
CASE WHEN [period] IS NOT NULL THEN
CASE WHEN area IS NOT NULL THEN
'LA by event type by month'
ELSE
'England event type by month'
END
ELSE
CASE WHEN area IS NOT NULL THEN
'LA by event type any time'
ELSE
'England event type any time'
END
END
ELSE
'CHECK GROUPING'
END AS grouping_type
,type
,period
,area
,COUNT(*) AS number_projects
,SUM(units) AS total_units
,AVG(value) as mean_value
,MIN(value) AS min_value
,MAX(value) AS max_value
,CASE WHEN type IS NOT NULL THEN
CASE WHEN [period] IS NOT NULL THEN
CASE WHEN area IS NOT NULL THEN
MIN(median_type_period_area)
ELSE
MIN(median_type_period)
END
ELSE
CASE WHEN area IS NOT NULL THEN
MIN(median_type_area)
ELSE
MIN(median_type)
END
END
ELSE
MIN(median_all)
END AS median_value
FROM (
SELECT *,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) OVER (PARTITION BY type) AS median_type,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) OVER (PARTITION BY type, [period]) AS median_type_period,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) OVER (PARTITION BY type, area) AS median_type_area,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) OVER (PARTITION BY type, [period], area) AS median_type_period_area,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) OVER () AS median_all,
FROM tb_source main
) main
GROUP BY
GROUPING SETS (
(type)
(type, [period]),
(type, area),
(type, [period], area),
()
);