excelperformancesub-arraysumproductnon-repetitive

Sumproduct of sub-arrays in excel


I need to find a performant and smart way to redesign the formula or the tables on which it depends (COSTO_DUMMY and GG_TARGET). Can you help me, please? I can add new support tables if needed

COSTO_DUMMY

key cost
AP01100GENNAIO 33.9492
AP01100FEBBRAIO 32.108
AP01100MARZO 27.889
AP01100APRILE 34.7004
AP01100MAGGIO 29.2037
AP01100GIUGNO 33.3176
AP01100LUGLIO 31.6459
AP01100AGOSTO 49.5292
AP01100SETTEMBRE 29.51
AP01100OTTOBRE 31.129
AP01100NOVEMBRE 30.776
AP01100DICEMBRE 34.7

GG_TARGET

key days
0050001643GENNAIO 16.2
0050001643FEBBRAIO 18.4
0050001643MARZO 21.5
0050001643APRILE 16.7
0050001643MAGGIO 20.4
0050001643GIUGNO 17.5
0050001643LUGLIO 18.4
0050001643AGOSTO 7.5
0050001643SETTEMBRE 20.4
0050001643OTTOBRE 19
0050001643NOVEMBRE 19.5
0050001643DICEMBRE 15.8

FACT_TABLE (the structure of this table cannot be modified); I have two column headers in a fixed position: the first one with AP% codes, the second one with the months

AP01100 AP01100 AP01100 AP01100 AP01100 AP01100 AP01100 AP01100 AP01100 AP01100 AP01100 AP01100
GENNAIO FEBBRAIO MARZO APRILE MAGGIO GIUGNO LUGLIO AGOSTO SETTEMBRE OTTOBRE NOVEMBRE DICEMBRE
0050001643 1 4

I need to do this formula for each row of the fact table (to be added next to "DICEMBRE"):
(1 * jan_cost * jan_days + 1 * feb_cost * feb_days + ... + 1 * dec_cost * dec_days) +
(4 * feb_cost * feb_days + 4 * mar_cost * mar_days + ... + 4 * dec_cost * dec_days) +
(0 * mar_cost * mar_days + 0 * apr_cost * apr_days + ... + 0 * dec_cost * dec_days) +
.....
(0 * dec_cost * dec_days)

The meaning is: I want to recruit 1 person from january to december and 4 people from february to december (that's why the "4" from fact table ignores january information)

I started writing the following formula (only for the first person to recruit on january and it's missing a vlookup for gg_Target, anyway I think it's clear the way I am thinking about it... the wrong way) but I really hope there is something smarter:

=vlookup(concat(offset(G27;25-row();0);offset(G27;26-row();0));COSTO_DUMMY;2;FALSE)*G27 +
vlookup(concat(offset(G27;25-RIF.RIGA();1);offset(G27;26-row();1));COSTO_DUMMY;2;FALSE)*G27 ....

In this formula I have the "1" of the fact_table on G27 cell (hence offset(G27;25-row();0) gives me "AP01100" while offset(G27;26-row();0)) gives me "GENNAIO")
Please, note that I want a constant value, so I am not expecting to use ctrl+shift+enter. I have this office version excel version


Solution

  • Assuming no Excel version constraints as per the tags listed in the question. The following is an array formula, that spills the entire result and considers more than one target key, even in the example there is only one value.

    Assumptons and comments:

    1. The input data in both input tables are organized in a similar manner (otherwise we would need and additional lookup), i.e, the costs and days for a given key are organized following the same logic. The input data is organized chronologically in ascending order, but it works for descending too.
    2. The rows don't need to match, only the order within the same key for the dates should be the same for both tables.
    3. The information for a given key doesn't need to come one after another.
    4. For any keys there is data for the 12 months.
    5. The AP codes don't have to be the same (even in the sample data are all the same).

    All previous assumption are based on the sample input data from the question. Here is the formula that covers a very general case, i.e. more than one target code, different AP codes:

    =LET(setA, A2:B13, setB, C2:D13, fcts, F2:R4, loc, "[$-0410]mmmm",
     months, EDATE(1,SEQUENCE(1,12,0)), tMonths, UPPER(TEXT(months, loc)),
     lks, DROP(fcts,2), aps, TAKE(DROP(fcts,,1),1), idx, SEQUENCE(COLUMNS(months)),
     gMonth, LAMBDA(x, XLOOKUP(x, tMonths,months)), SPLIT,LAMBDA(x,LET(
      a,TEXTBEFORE(x,tMonths),b,SUBSTITUTE(x,a,""),HSTACK(a,gMonth(b)))),
     kA, SPLIT(TAKE(setA,,1)), cA, DROP(setA,,1), kB, SPLIT(TAKE(setB,,1)),
     cB, DROP(setB,,1), CALC, LAMBDA(k,ap,m, LET(
      fa, FILTER(cA, (INDEX(kA,,1)=ap) * (INDEX(kA,,2)>=m)),
      fb, FILTER(cB, (INDEX(kB,,1)=k) * (INDEX(kB,,2)>=m)), SUM(fa*fb))),
     BYROW(lks, LAMBDA(lk, LET(k, TAKE(lk,,1), mults, DROP(lk,,1),
      REDUCE(0, idx, LAMBDA(ac,i, LET(ap, INDEX(aps,,i), m, INDEX(months,,i),
       ac + INDEX(mults,,i) * CALC(k,ap,m))))))))
    

    It is a large formula, because it requires several intermediate calculations. It depends on the input ranges for only three names: setA, setB and fcts. The rest of the names, are obtained from them, playing with DROP, TAKE, INDEX, etc, functions.

    To do the calculation we filter for months greater or equal than. We convert the input months in text format into Excel dates. The months in date format (months) are generated using EDATE function combined with SEQUENCE. The month is represented as the first date of the month in date format (we use as a reference year 1900). We obtain the corresponding months in text format (tMonths) via TEXT function (no need to take it from the FACT table), using Italian as specific locale (locname). I took the idea from here: Specifying a Language for the TEXT Function.

    Now we use the following helper functions we created as user LAMBDA functions:

    1. gMonth(x): Given an input date x in text format returns the corresponding month in date format. Created to make the formula easier to read and for debugging purposes.
    2. SPLIT(x): Given a key x as input. It generates an array of two columns separating the key from the date. The second column returns the corresponding text date in date format.
    3. CALC(k,ap,m): Given a key (k), an AP code (ap) and a month (m in date format), does the multiplication for cost and days only for months greater or equal than m.

    Now we have all we need. We iterate over all rows (lks) from from the FACT table. Each row (lk) contains the key code (k) and the multipliers (mults). We extract this information via DROP and TAKE functions. Now we need to do the total sum for each row. We use REDUCE for that to iterate over the index positions (idx). For each index (i), we obtain the AP code (ap) and the month (m), so we can invoke the CALC function on each iteration and accumulate the result (ac):

    ac + INDEX(mults,,i) * CALC(k,ap,m)
    

    Here is the output in S3 cell: output

    As per OP additional information provided, TAKE and user LAMBDA function are not available for his Excel version. Here a solution that doesn't use such functions:

    =LET(setA, A2:B13, setB, C2:D13, fcts, F2:R4, loc, "[$-0410]mmmm",
     months, EDATE(1,SEQUENCE(1,12,0)), tMonths, UPPER(TEXT(months, loc)),
     lks, DROP(fcts,2), aps, INDEX(DROP(fcts,,1),1,0),idx, SEQUENCE(COLUMNS(months)),
     cA, DROP(setA,,1), cB, DROP(setB,,1),
     kA, LET(a,TEXTBEFORE(INDEX(setA,,1),tMonths),b,
      SUBSTITUTE(INDEX(setA,,1),a,""),HSTACK(a,XLOOKUP(b, tMonths,months))),
     kB, LET(a,TEXTBEFORE(INDEX(setB,,1),tMonths),b,
      SUBSTITUTE(INDEX(setB,,1),a,""),HSTACK(a,XLOOKUP(b, tMonths,months))),
     BYROW(lks, LAMBDA(lk, LET(k, INDEX(lk,,1), mults, DROP(lk,,1),
      REDUCE(0, idx, LAMBDA(ac,i, LET(ap, INDEX(aps,,i), m, INDEX(months,,i),
       fa, FILTER(cA, (INDEX(kA,,1)=ap) * (INDEX(kA,,2)>=m)),
       fb, FILTER(cB, (INDEX(kB,,1)=k) * (INDEX(kB,,2)>=m)),
       ac + INDEX(mults,,i) * SUM(fa*fb))))))))