Wanted to write a SQL Server query to find median with given frequency of number.
Table:
+----------+-------------+
| number | frequency |
+----------+-------------|
| 2 | 7 |
| 3 | 1 |
| 5 | 3 |
| 7 | 1 |
+----------+-------------+
In this table, the numbers are 2, 2, 2, 2, 2, 2, 2, 3, 5, 5, 5, 7
so the median is (2 + 2) / 2 = 2
I have created the following query using recursive CTE sqlfiddle, is there a better way to write this one with more efficiency?
My solution:
/* recursive CTE to generate numbers with given frequency */
with nums as
(
select
number,
frequency
from numbers
union all
select
number,
frequency - 1
from nums
where frequency > 1
)
/* find median */
select
round(avg(number * 1.0), 2) as median
from
(
select
number,
count(*) over () as ttl,
row_number() over (order by number) as rnk
from nums
) t
where rnk = (case when ttl%2 != 0 then (ttl/2) else (ttl/2)+1 end)
or rnk = (case when ttl%2 = 0 then (ttl/2)+1 end)
Just do a cumulative sum and take the middle value. I think this is the logic:
select avg(number)
from (select t.*,
sum(freq) over (order by number asc) as sum_freq,
sum(freq) over () as cnt
from t
) t
where cnt <= 2 * sum_freq and
cnt >= 2 * (sum_freq - freq);
Here is a db<>fiddle.