if-statementgoogle-sheetslambdareducearray-formulas

Data manipulation in google sheets with arrayformula and lambda helpers


i've got a quite complex situation to solve with google sheets. I already solved it in a very unelegant way, and I would need your help to understand why other solutions I tried failed, and maybe to find a better solution. I will put a link to the sheet in the end of the post.

Here is the scenario:

My data is composed of a list of employees (A2:A), a column for a cost centre (B2:B) and a column for each month (C2:N) containing a percentage of imputation for that employee, in that month for that cost centre. This allows me to distribute the employee cost in different cost centres with another file. This means the same employee can occur several times, but with different cost centres (like employee name 1 in my data)

Dipendente Cost Centre 01/01/2024 01/02/2024 01/03/2024 01/04/2024 01/05/2024 01/06/2024 01/07/2024 01/08/2024 01/09/2024 01/10/2024 01/11/2024 01/12/2024
Employee Name1 0 - HABITAT 10,00% 30,00% 90,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00%
Employee Name1 0 - COMUNICAZIONE 90,00% 10,00% 100,00% 50,00% 0,00% 0,00% 0,00% 0,00% 0,00% 0,00% 0,00% 0,00%
Employee Name2 SAI 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00%
Employee Name3 SAI PALERMO 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00%
Employee Name4 0 - AMMINISTRAZIONE 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00% 100,00%

Starting Data

The output I want is an array where, for each combination of employee and cost centre, there should 12 rows (1 for each month) with the imputation percentage. The output I need is the following:

Mese Dipendente CdC Imputazione
gen 24 Employee Name1 0 - COMUNICAZIONE 90,00%
gen 24 Employee Name1 0 - HABITAT 10,00%
gen 24 Employee Name2 SAI 100,00%
gen 24 Employee Name3 SAI PALERMO 100,00%
feb 24 Employee Name1 0 - COMUNICAZIONE 10,00%
feb 24 Employee Name1 0 - HABITAT 30,00%
feb 24 Employee Name2 SAI 100,00%
feb 24 Employee Name3 SAI PALERMO 100,00%
mar 24 Employee Name1 0 - COMUNICAZIONE 100,00%
mar 24 Employee Name1 0 - HABITAT 90,00%
mar 24 Employee Name2 SAI 100,00%
mar 24 Employee Name3 SAI PALERMO 100,00%
apr 24 Employee Name1 0 - COMUNICAZIONE 50,00%
apr 24 Employee Name1 0 - HABITAT 100,00%
... ... ... ...

Desired Output

To get this output I made use of several named functions (may google be blessed for this amazing feature):

Function TUPLEGENERATION(employee): it generates all the rows for a single employee for a single cost centre. This is the code, where the employee reference can be any cell in the A2:A range:

=TOROW({
transpose(_MONTHS) \                                      //_MONTHS is a defined range just containing an array of months from january to december
transpose(split(rept(employee&"|";12);"|")) \             //this allows me to repeat 12 times the name of the employee, 1 time for each month
transpose(split(rept(offset(employee;0;1)&"|";12);"|")) \ //this does the same as above, but with the cost centre (its' the cell next to the employee, so I use offset)
transpose(offset(employee;0;2;1;12))})                    //this gets the 12 imputation values (next 12 columns)

the output of this function, applied to the first data row, is the following: OUTPUT and here is the transposed version, to let you better understand it: TRANSPOSED OUTPUT i'm not including the table output here, as it's not very significant

Function MULTITUPLEGEN: it just applies a BYROW to the TUPLEGENERATION function, so instead of a single cell, it requires a range as a parameter. Here is the code:

=BYROW(range;TUPLEGENERATION)

here is the output applied to the first 3 rows of my data

gen 24 Employee Name1 0 - HABITAT 10,00% feb 24 Employee Name1 0 - HABITAT 30,00% mar 24 Employee Name1 0 - HABITAT 90,00% ...
gen 24 Employee Name1 0 - COMUNICAZIONE 90,00% feb 24 Employee Name1 0 - COMUNICAZIONE 10,00% mar 24 Employee Name1 0 - COMUNICAZIONE 100,00% ...
gen 24 Employee Name2 SAI 100,00% feb 24 Employee Name2 SAI 100,00% mar 24 Employee Name2 SAI 100,00% ...
gen 24 Employee Name3 SAI PALERMO 100,00% feb 24 Employee Name3 SAI PALERMO 100,00% mar 24 Employee Name3 SAI PALERMO 100,00% ...
gen 24 Employee Name4 0 - AMMINISTRAZIONE 100,00% feb 24 Employee Name4 0 - AMMINISTRAZIONE 100,00% mar 24 Employee Name4 0 - AMMINISTRAZIONE 100,00% ...

MULTITUPLEGEN OUTPUT

Function MONTHGEN: here the magic happens. It uses the output of MULTITUPLEGEN to ectract a specific month data for that range of employees. the first parameter is a number from 1 to 12, corresponding to the month you want generated the second parameter is the range of employees attributions to generate Here is the code:

=CHOOSECOLS(MULTITUPLEGEN(range);sequence(4;1;month*4-3;1))

and here is the output applied to the first 3 rows of my data

mar 24 Employee Name1 0 - HABITAT 90,00%
mar 24 Employee Name1 0 - COMUNICAZIONE 100,00%
mar 24 Employee Name2 SAI 100,00%

MONTHGEN OUTPUT

So far, nothing is too much unelegant (yes, it could be better optimized, but i'm not really worried about it. I split the code in several functions in a series of trials and errors related to the use of Arrayformula (i thought i could fool the engine to avoid certain problems), but the only solution I found to work for getting from here to the desired output is the following:

UGLY FUNCTION

=sort({MONTHGEN(1;attrib!A2:A4);MONTHGEN(2;attrib!A2:A4);MONTHGEN(3;attrib!A2:A4);MONTHGEN(4;attrib!A2:A4);
MONTHGEN(5;attrib!A2:A4);MONTHGEN(6;attrib!A2:A4);MONTHGEN(7;attrib!A2:A4);MONTHGEN(8;attrib!A2:A4);
MONTHGEN(9;attrib!A2:A4);MONTHGEN(10;attrib!A2:A4);MONTHGEN(11;attrib!A2:A4);MONTHGEN(12;attrib!A2:A4)})

here is the output applied to the same 3 rows:

gen 24 Employee Name1 0 - COMUNICAZIONE 90,00%
gen 24 Employee Name1 0 - HABITAT 10,00%
gen 24 Employee Name2 SAI 100,00%
feb 24 Employee Name1 0 - COMUNICAZIONE 10,00%
feb 24 Employee Name1 0 - HABITAT 30,00%
feb 24 Employee Name2 SAI 100,00%
mar 24 Employee Name1 0 - COMUNICAZIONE 100,00%
mar 24 Employee Name1 0 - HABITAT 90,00%
mar 24 Employee Name2 SAI 100,00%
apr 24 Employee Name1 0 - COMUNICAZIONE 50,00%
... ... ... ...

output

The output is the desired one, and in my last version, the range is a static range so it doesn't appear in the formula at all, but I really cannot stand that I had to manually concatenate 12 times a function with an incremental number parameter, without being able to use ARRAYFORMULA or other similar functions.

Every attempt to automatically iterate the functions resulted in ugly results with misplaced data in the columns.

I spent hours trying different approaches to the problem, involving different starting functions and data manipulation, but here are the main problems I hit:

  1. WRAPROWS is a very useful function applied to my TUPLEGENERATION function, but accepts only a single row as a range. It is not able to stack several ranges on top of another, so it's useless with MULTITUPLEGEN function, and I was not able to iterate it with ARRAYFORMULA or BYROW.

  2. VSTACK is useful if you manually insert it it several WRAPROWS(TUPLEGENERATION) outputs, but i was not able to iterate it with BYROW or ARRAYFORMULA to generate my output

  3. In general, when iterating a certain range, I found it's quite difficult to expand it dynamically generating more rows in the middle of it. the starting range. I tried most LAMBDA helper functions, like SCAN, MAKEARRAY, LET, etc., but the main problem is that most of them are allowed to only return 1 row per original iteration, and trying to bypass this by using transposition and stacking just doesn't work because of the order of execution, as the transposition happens at the wrong time returning bad data. I admit I didn't really dive long into MAKEARRAY though, so I cannot exclude I just didn't use it to its full potential.

here is the sheet: https://docs.google.com/spreadsheets/d/e/2PACX-1vS7Vi43KDQigiBlAR9HnTsfK9TSo7PoSPTZZuys9GaczADUSXv_qY_uCSuwy7jeq3m9JGcqDACs_ywi/pubhtml

thanks in advance for the help!


Solution

  • it's basically easy as:

    =INDEX(SORT(SPLIT(TOCOL(C1:N1&"×"&A2:A6&"×"&B2:B6&"×"&C2:N6); "×")))
    

    enter image description here

    feeding it dynamic range would be:

    =INDEX(LET(x; C2:INDEX(N:N; MAX(ROW(A:A)*(A:A<>""))); 
     SORT(SPLIT(TOCOL(OFFSET(x;-1;;1;)&"×"&OFFSET(x;;-2;;1)&"×"&OFFSET(x;;-1;;1)&"×"&x); "×"))))
    

    enter image description here

    and first column of output can be formatted either internally via 123 button:

    enter image description here

    or with formula:

    =QUERY(INDEX(LET(x; C2:INDEX(N:N; MAX(ROW(A:A)*(A:A<>""))); SORT(SPLIT(TOCOL(
     OFFSET(x;-1;;1;)&"×"&OFFSET(x;;-2;;1)&"×"&OFFSET(x;;-1;;1)&"×"&x); "×")))); 
     "format Col1 'mmm yyyy'"; )
    

    enter image description here