The goal is to use MEDIAN as a window function with a sliding window of a specific size.
SELECT *,
MEDIAN(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM test_data
ORDER BY id;
Sliding window frame unsupported for function MEDIAN
For sample data:
CREATE OR REPLACE TEMP TABLE test_data
AS
SELECT index AS id, ABS(RANDOM(5))%100 AS n
FROM TABLE(FLATTEN(ARRAY_GENERATE_RANGE(1,10)));
The expected output for a sliding window of size 4 is:
ID | N | MEDIAN | |
---|---|---|---|
0 | 74 | 74 | 74 |
1 | 28 | 51 | (28+74)/2 = 51 |
2 | 0 | 28 | 28 |
3 | 18 | 23 | (18+28)/2 = 23 |
4 | 44 | 23 | (18+28)/2 = 23 |
5 | 57 | 31 | (18+44)/2 = 31 |
6 | 31 | 37.5 | (31+44)/2 = 37.5 |
7 | 27 | 37.5 | (31+44)/2 = 37.5 |
8 | 96 | 44 | (31+57)/2 = 44 |
Seeking for SQL only approach.
Median with a sliding window can be emulated by using ARRAY_AGG, ARRAY_SORT and choosing the "middle" value:
SELECT *,
ARRAY_SORT(ARRAY_AGG(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)) AS sorted_arr,
CASE WHEN ARRAY_SIZE(sorted_arr) % 2 = 1 THEN sorted_arr[FLOOR(ARRAY_SIZE(sorted_arr)/2)]::FLOAT
ELSE (sorted_arr[ARRAY_SIZE(sorted_arr)/2-1]::FLOAT + sorted_arr[ARRAY_SIZE(sorted_arr)/2]::FLOAT)/2
END AS median
FROM test_data
ORDER BY id;
For clarity cumulative_arr
and sorted_arr
presented as separate steps:
SELECT *,
ARRAY_AGG(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS cumulative_arr,
ARRAY_SORT(cumulative_arr) AS sorted_arr,
CASE WHEN ARRAY_SIZE(sorted_arr)%2=1 THEN sorted_arr[FLOOR(ARRAY_SIZE(sorted_arr)/2)]::FLOAT
ELSE (sorted_arr[ARRAY_SIZE(sorted_arr)/2-1]::FLOAT + sorted_arr[ARRAY_SIZE(sorted_arr)/2]::FLOAT)/2
END AS median
FROM test_data
ORDER BY id;
Output: