I come from an R background where the apply
family of functions makes it incredibly easy to apply a function over multiple columns at once. In the updated example below, I have a table with multiple "pctile" columns, with numbers ranging from 1 - 100 in these columns:
WITH
wide_stats AS (
SELECT
'joe' name, 'bills' team, false as is_o,
7 as stat1, 95 as stat1_pctile,
9 as stat2, 82 as stat2_pctile,
3 as stat3, 67 as stat3_pctile
UNION ALL
SELECT
'joe', 'bills', true as is_o,
1 as stat1, 37 as stat1_pctile,
5 as stat2, 54 as stat2_pctile,
2 as stat3, 17 as stat3_pctile
UNION ALL
SELECT
'tim' name, 'jets' team, false as is_o,
17 as stat1, 65 as stat1_pctile,
12 as stat2, 85 as stat2_pctile,
13 as stat3, 69 as stat3_pctile
UNION ALL
SELECT
'tim' name, 'jets' team, true as is_o,
15 as stat1, 32 as stat1_pctile,
14 as stat2, 56 as stat2_pctile,
12 as stat3, 16 as stat3_pctile
)
SELECT * FROM wide_stats
To each column that ends in _pctile
, I would like to run the function if(is_o, 100 - col_val, col_val)
, that simply (a) checks if the is_o
column is true, and (b) if true, then do 100 - col_val
on whatever value is already in the column, otherwise leave the column unchanged. I would like to achieve this without having to manually write this code out for each column ending in _pctile
, as I have 100+ of these in my actual, very wide data.
Although this is BQ instead of R, this type of apply
functionality would help me significantly with my task.
Unfortunately, you are paying the price for the non-optimal schema you've chosen to use.
The only option I see here is to flatten your data such that each key(stat1, stat2, stat3, etc) is on separate row as in example below (BigQuery Standard SQL)
#standardSQL
SELECT * REPLACE(IF(is_o, 100 - value_pctile, value_pctile) AS value_pctile)
FROM (
SELECT name, team, is_o,
REPLACE(SPLIT(kv, '":')[OFFSET(0)], '_pctile', '') key,
CAST(REGEXP_EXTRACT(TO_JSON_STRING(t), CONCAT(r'"', REPLACE(SPLIT(kv, '":')[OFFSET(0)], '_pctile', ''), r'":(\d+),')) AS INT64) value,
SPLIT(kv, '":')[OFFSET(0)] key_pctile,
CAST(SPLIT(kv, '":')[OFFSET(1)] AS INT64) value_pctile
FROM wide_stats t, UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(\w+_pctile":\d+),')) kv
)
if to apply to sample data from your question - result is
Row name team is_o key value key_pctile value_pctile
1 joe bills false stat1 7 stat1_pctile 95
2 joe bills false stat2 9 stat2_pctile 82
3 joe bills true stat1 1 stat1_pctile 63
4 joe bills true stat2 5 stat2_pctile 46
5 tim jets false stat1 17 stat1_pctile 65
6 tim jets false stat2 12 stat2_pctile 85
7 tim jets true stat1 15 stat1_pctile 68
8 tim jets true stat2 14 stat2_pctile 44
As you can see, if you would have your table flattened to start with - the query would be as simple as
#standardSQL
SELECT * REPLACE(IF(is_o, 100 - value_pctile, value_pctile) AS value_pctile)
FROM flattened_table
Note: for all practical purposes - you anyway will need to flatten whatever result you need otherwise you constantly be facing same very issue like this one!