data-sciencekdb+qmoving-average

in KDB q, how do you write a custom mavg that drops the top 5 and bottom 5 results before doing moving average


I have a 5min table that has a score value like the table below. How can I add a new column which is the 10-period moving average which drops the highest 2 and lowest 2 scores first, and just average the 6 remaining values?

q)select from scores_table

|datetime                       | score |
|------------------------------ | ----- |
|2022.10.28D22:20:00.000000000  | 223   |
|2022.10.28D22:25:00.000000000  | 221   |
|2022.10.28D22:30:00.000000000  | 231   |
|2022.10.28D22:35:00.000000000  | 182   |
|2022.10.28D22:40:00.000000000  | 204   |
|2022.10.28D22:45:00.000000000  | 143   |
|2022.10.28D22:50:00.000000000  | 228   |
|2022.10.28D22:55:00.000000000  | 203   |
|2022.10.28D23:00:00.000000000  | 158   |
|2022.10.28D23:05:00.000000000  | 98    |
|2022.10.28D23:10:00.000000000  | 111   |
|2022.10.28D23:15:00.000000000  | 146   |
|2022.10.28D23:20:00.000000000  | 94    |
|2022.10.28D23:25:00.000000000  | 159   |
|2022.10.28D23:30:00.000000000  | 93    |
|2022.10.28D23:35:00.000000000  | 91    |
|2022.10.28D23:40:00.000000000  | 132   |
|2022.10.28D23:45:00.000000000  | 77    |
|2022.10.28D23:50:00.000000000  | 179   |
|2022.10.28D23:55:00.000000000  | 140   |

In other words, I want to do something like "update mavg10:10 mavg score from scores_table" but before doing the average of the past 10 items, drop the highest 2 and lowest 2 scores, then do average on the rest. And then have the result in a new column.

I also read: https://code.kx.com/q/kb/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window and I tried modifying the swin2 code snippet there, but I couldn't get it to work :(

Any advice would be appreciated, thanks.


Solution

  • If you look at swin from the link you posted, you can see that it uses the inbuild avg function. So all you have to do, is to write a function, that takes the elements at index 2 to 7 of a sorted 10 element list (index 0 to 7, discharging index 0,1 and 8,9) and calculate the avg of those.

    q)swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
    q)swin[avg; 3; til 10]
    

    let's build that function

    x where (iasc iasc 0N!x:-10?10) within 2 7
    8 0 1 4 9 6 7 3 5 2
    4 6 7 3 5 2
    

    as you can see from above, applying iasc twice gives you the indexes of the list (sorted in ascending order), we then ignore index 0,1 and 8,9. now apply the average

    f:{avg x where (iasc iasc x) within 2 7}
    

    we can now use the function f together with swin

    t:([] datetime:.z.D+20?.z.t; score:20?1000)
    t
    datetime                      score
    -----------------------------------
    2025.03.02D14:34:54.963000000 922
    2025.03.02D08:24:53.628000000 360
    2025.03.02D09:26:16.172000000 605
    2025.03.02D01:38:45.609000000 550
    2025.03.02D11:01:05.002000000 489
    2025.03.02D10:40:23.165000000 496
    2025.03.02D15:01:41.486000000 989
    2025.03.02D08:30:06.037000000 704
    2025.03.02D03:28:55.129000000 416
    2025.03.02D09:57:28.855000000 366
    2025.03.02D09:28:13.942000000 347
    2025.03.02D11:03:58.982000000 270
    2025.03.02D14:37:23.380000000 221
    2025.03.02D00:06:59.623000000 396
    2025.03.02D13:02:02.104000000 206
    2025.03.02D15:02:05.012000000 169
    2025.03.02D04:35:02.468000000 437
    2025.03.02D05:00:51.910000000 450
    2025.03.02D09:37:08.266000000 187
    2025.03.02D12:09:01.385000000 798
    
    update swin[f;10;score] from t
    datetime                      score
    --------------------------------------
    2025.03.02D14:34:54.963000000 0
    2025.03.02D08:24:53.628000000 0
    2025.03.02D09:26:16.172000000 60
    2025.03.02D01:38:45.609000000 151.6667
    2025.03.02D11:01:05.002000000 233.1667
    2025.03.02D10:40:23.165000000 315.8333
    2025.03.02D15:01:41.486000000 416.6667
    2025.03.02D08:30:06.037000000 534
    2025.03.02D03:28:55.129000000 543.3333
    2025.03.02D09:57:28.855000000 543.3333
    2025.03.02D09:28:13.942000000 487
    2025.03.02D11:03:58.982000000 487
    2025.03.02D14:37:23.380000000 444
    2025.03.02D00:06:59.623000000 418.3333
    2025.03.02D13:02:02.104000000 381.8333
    2025.03.02D15:02:05.012000000 336
    2025.03.02D04:35:02.468000000 336
    2025.03.02D05:00:51.910000000 336
    2025.03.02D09:37:08.266000000 301
    2025.03.02D12:09:01.385000000 312.8333
    

    Edit: Note, index within 2 7 is hardcoded but that can easily be modified