I have a CSV imported to Google Spreadsheet with a column with values like 20230110_30 ::: 20240112_40
. This format encodes that a user in this row has made two donations: 30EUR on 10 Jan, 2023 and 40EUR on 12 Jan, 2024. Of course, it can be one donation, two donations, three donations, etc.
How can I calculate the following aggregate values for every row in my spreadsheet:
I've tried using ArrayFormula
and regexextract
, but regexextract(, "_(\d+)")
only extracts the first occurrence, but I need all of them to aggregate. Maybe a way forward could be doing some string replaces and pushing the resulting expression into some eval
function if it exists in Spreadsheets?
I control the format of this column (except the separator of :::
), so if needed I can do tweaks.
I can easily do this in Python, but struggle with Google Spreadsheets formulas. Is it possible with formulas? If not, is it possible to write formulas using Python or some other fully-fledged programming language and stay in the Google Spreadsheet environment?
Grateful for any help!
I've tried making Apps Script:
function number_of_donations(s) {
if(Array.isArray(s))
return s.map(number_of_donations);
else
return s.split(' ::: ').length;
}
This worked and I can implement the other functionality using Apps Script. Is it possible to place this code directly into the spreadsheet without a separate Apps Script project?
But I still wonder if it's possible to use some clever native Spreadsheet formulas/functions to implement the functionality above.
Given a string in the format 20230110_30
, we can use the following regular expression to extract 2023
and 30
:
REGEXEXTRACT(string,"(....).+_(.+)")
If string
is composed of substrings separated by :::
, string
becomes the following array of strings:
TRIM(TOCOL(SPLIT(string,":")))
So the REGEXEXTRACT
becomes:
REGEXEXTRACT(TRIM(TOCOL(SPLIT(string,":"))),"(....).+_(.+)")
We can then use QUERY
to select the max
, count
and sum
.
=INDEX(
QUERY(
--REGEXEXTRACT(TRIM(TOCOL(SPLIT(A2,":"))),"(....).+_(.+)"),
"select max(Col1), count(Col1), sum(Col2)"),2)
If we want to apply this to the whole column, we can use MAP
:
=MAP(A2:A,
LAMBDA(str,IF(str="",,
INDEX(
QUERY(
--REGEXEXTRACT(TRIM(TOCOL(SPLIT(str,":"))),"(....).+_(.+)"),
"select max(Col1), count(Col1), sum(Col2)"),2))))