BigQuery supports:
The question #1: "Does BigQuery support analytic user-defined functions?"
The motivation behind this is that I want to implement the split-apply-combine pattern that is usually seen in Python pandas code. This could be useful for in-group normalization and other transformations that use group statistics.
I did a small test in Standart SQL:
create or replace function `mydataset.mylen`(arr array<string>) returns int64 as (
array_length(arr)
);
WITH Produce AS
(SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
UNION ALL SELECT 'orange', 2, 'fruit'
UNION ALL SELECT 'cabbage', 9, 'vegetable'
UNION ALL SELECT 'apple', 8, 'fruit'
UNION ALL SELECT 'leek', 2, 'vegetable'
UNION ALL SELECT 'lettuce', 10, 'vegetable')
SELECT
item,
purchases,
category,
`mydataset.mylen`(item) over (mywindow) as windowlen
FROM Produce
window mywindow as (
partition by category
)
When I run the code above, I get:
Query error: Function mydataset.mylen does not support an OVER clause at [16:3]
Thus, in case BigQuery does support analytic UDFs, the question #2: "How to implement a UDF so that it supports an OVER clause?"
You are very close to solving the problem :)
A little context for the reader of the answer, BigQuery doesn't support user-defined aggregate/analytical function, so a way of emulating it is to write a scalar UDF accepting an array as input. Then in the query, array_agg() function is used to pack the data as input to the UDF, (and this is the step missing from the question).
`mydataset.mylen`(item) over (mywindow) as windowlen
=>
`mydataset.mylen`(array_agg(item) over (mywindow)) as windowlen