sqlsql-server

Find median with given frequency of number


Wanted to write a SQL Server query to find median with given frequency of number.

Table:

+----------+-------------+
|  number  |  frequency  |
+----------+-------------|
|  2       |  7          |
|  3       |  1          |
|  5       |  3          |
|  7       |  1          |
+----------+-------------+

In this table, the numbers are 2, 2, 2, 2, 2, 2, 2, 3, 5, 5, 5, 7 so the median is (2 + 2) / 2 = 2

I have created the following query using recursive CTE sqlfiddle, is there a better way to write this one with more efficiency?

My solution:

/* recursive CTE to generate numbers with given frequency */

with nums as
(
    select
        number,
        frequency
    from numbers

    union all

    select
        number,
        frequency - 1
    from nums
    where frequency > 1
)

/* find median */

select
    round(avg(number * 1.0), 2) as median
from
(
    select
        number,
        count(*) over () as ttl,
     row_number() over (order by number) as rnk
    from nums
) t
where rnk = (case when ttl%2 != 0 then (ttl/2) else (ttl/2)+1 end)
or rnk = (case when ttl%2 = 0 then (ttl/2)+1 end)

Solution

  • Just do a cumulative sum and take the middle value. I think this is the logic:

    select avg(number)
    from (select t.*,
                 sum(freq) over (order by number asc) as sum_freq,
                 sum(freq) over () as cnt
          from t
         ) t
    where cnt <= 2 * sum_freq and
          cnt >= 2 * (sum_freq - freq);
    

    Here is a db<>fiddle.