snowflake-cloud-data-platformwindow-functionsmedian

Median with a sliding window


The goal is to use MEDIAN as a window function with a sliding window of a specific size.

SELECT *,
      MEDIAN(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM test_data
ORDER BY id;
Sliding window frame unsupported for function MEDIAN

For sample data:

CREATE OR REPLACE TEMP TABLE test_data
AS
SELECT index AS id, ABS(RANDOM(5))%100 AS n
FROM TABLE(FLATTEN(ARRAY_GENERATE_RANGE(1,10)));

The expected output for a sliding window of size 4 is:

ID N MEDIAN
0 74 74 74
1 28 51 (28+74)/2 = 51
2 0 28 28
3 18 23 (18+28)/2 = 23
4 44 23 (18+28)/2 = 23
5 57 31 (18+44)/2 = 31
6 31 37.5 (31+44)/2 = 37.5
7 27 37.5 (31+44)/2 = 37.5
8 96 44 (31+57)/2 = 44

Seeking for SQL only approach.


Solution

  • Median with a sliding window can be emulated by using ARRAY_AGG, ARRAY_SORT and choosing the "middle" value:

    SELECT *, 
      ARRAY_SORT(ARRAY_AGG(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)) AS sorted_arr,
      CASE WHEN ARRAY_SIZE(sorted_arr) % 2 = 1 THEN sorted_arr[FLOOR(ARRAY_SIZE(sorted_arr)/2)]::FLOAT
           ELSE (sorted_arr[ARRAY_SIZE(sorted_arr)/2-1]::FLOAT + sorted_arr[ARRAY_SIZE(sorted_arr)/2]::FLOAT)/2
      END AS median
    FROM test_data
    ORDER BY id;
    

    For clarity cumulative_arr and sorted_arr presented as separate steps:

    SELECT *, 
      ARRAY_AGG(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS cumulative_arr,
      ARRAY_SORT(cumulative_arr) AS sorted_arr,
      CASE WHEN ARRAY_SIZE(sorted_arr)%2=1 THEN sorted_arr[FLOOR(ARRAY_SIZE(sorted_arr)/2)]::FLOAT
           ELSE (sorted_arr[ARRAY_SIZE(sorted_arr)/2-1]::FLOAT + sorted_arr[ARRAY_SIZE(sorted_arr)/2]::FLOAT)/2
        END AS median
    FROM test_data
    ORDER BY id;
    

    Output:

    Median as window function with a sliding window