google-sheetsgoogle-apps-script

Aggregate concatenated numbers with Google Spreadsheet


I have a CSV imported to Google Spreadsheet with a column with values like 20230110_30 ::: 20240112_40. This format encodes that a user in this row has made two donations: 30EUR on 10 Jan, 2023 and 40EUR on 12 Jan, 2024. Of course, it can be one donation, two donations, three donations, etc.

How can I calculate the following aggregate values for every row in my spreadsheet:

I've tried using ArrayFormula and regexextract, but regexextract(, "_(\d+)") only extracts the first occurrence, but I need all of them to aggregate. Maybe a way forward could be doing some string replaces and pushing the resulting expression into some eval function if it exists in Spreadsheets?

I control the format of this column (except the separator of :::), so if needed I can do tweaks.

I can easily do this in Python, but struggle with Google Spreadsheets formulas. Is it possible with formulas? If not, is it possible to write formulas using Python or some other fully-fledged programming language and stay in the Google Spreadsheet environment?

Grateful for any help!


I've tried making Apps Script:

function number_of_donations(s) {
  if(Array.isArray(s))
    return s.map(number_of_donations);
  else
      return s.split(' ::: ').length;
}

This worked and I can implement the other functionality using Apps Script. Is it possible to place this code directly into the spreadsheet without a separate Apps Script project?

But I still wonder if it's possible to use some clever native Spreadsheet formulas/functions to implement the functionality above.


Solution

  • Given a string in the format 20230110_30, we can use the following regular expression to extract 2023 and 30:

    REGEXEXTRACT(string,"(....).+_(.+)")
    

    If string is composed of substrings separated by :::, string becomes the following array of strings:

    TRIM(TOCOL(SPLIT(string,":")))
    

    So the REGEXEXTRACT becomes:

    REGEXEXTRACT(TRIM(TOCOL(SPLIT(string,":"))),"(....).+_(.+)")
    

    We can then use QUERY to select the max, count and sum.

    =INDEX(
      QUERY(
        --REGEXEXTRACT(TRIM(TOCOL(SPLIT(A2,":"))),"(....).+_(.+)"),
        "select max(Col1), count(Col1), sum(Col2)"),2)  
    

    enter image description here

    If we want to apply this to the whole column, we can use MAP:

    =MAP(A2:A,
       LAMBDA(str,IF(str="",,
         INDEX(
           QUERY(
             --REGEXEXTRACT(TRIM(TOCOL(SPLIT(str,":"))),"(....).+_(.+)"),
             "select max(Col1), count(Col1), sum(Col2)"),2)))) 
    

    enter image description here