excelexcel-formulaarray-formulasexcel-365

Looking for an alternative to COUNTIFS() that will work with a dynamic range (eg: array output of FILTER() function)


Off the top: this question is similar to others that have been asked, but solutions on those others have not helped me here.

Ex: Using an array formula in countifs range, Using Range as Criteria in Countif, CountIf With Filtered Data

The Ask: Using Excel 365, I am doing matrix multiplication on two frequency-weighted variables in a large table that requires a fair bit of filtering. Here's a minimal version of my data, a working example of what I'm trying to do, followed by the wrinkle I'm trying to solve:

Row A-Filter1 B-Filter2 C-Var1 D-Var2
2 1 1 1 10
3 1 1 5 12
4 1 1 1 11
5 1 1 2 12
6 1 1 4 13
7 1 2 5 11
8 1 2 3 12
9 1 2 1 12
10 1 2 2 13
11 1 2 4 14

The steps to do my calc are as follows:

A: Axis1: Frequency of unique values of Var1 where Filter1=1 multiplied by Var1 (to get freq-weighted Var1):

  1. Get Unique Var1,and sort: SORT(UNIQUE(FILTER($C$2:$C$11, ($A$2:$A$11=1) )))
  2. Get count of #1 by value: COUNTIFS($C$2:$C$11, <result of #1>,$A$2:$A$11,1)
  3. Express #2 as a %: <for each of #2> / SUM(<result of #2>)
  4. Multiply #1 by #3 to get Freq-weighted Var1: <result of #1> * <result of #3>

B: Axis2: Frequency of unique values of Var2 where Filter1=1 AND Filter2=1 multiplied by Some value X less Var2 (to get freq-weighted "X - Var2"):

Steps are exactly the same as Axis #1, but adding the extra Filter and the subtracting Var2 from X

C: Matrix Multiply: SUM(<Each ofAxis1:Result #4> * <Each ofAxis2:Result #4>)

D: Multiply Step C by count of Filter1=1 & Filter2=2: <Resulted of Step C> * COUNTIFS($B$2:$B$11, 1, $A$2:$A$11, 1)

The purpose of this math doesn't really matter, but if it helps wrap your head around it you can think of Filter1 = month and Filter2 = day, so we're coming up with a sum of the frequency-weighted occurrence of Var1 in a month to Var2 in a day, across the # of hours in a day.

In any case, I'll save you all the time and share a single equation/function that does exactly what I described above (though I need to transpose one axis to get the matrix-multiplication to work). And I set X = 50:

=SUM(
  (
    COUNTIFS($C$2:$C$11,
                 TRANSPOSE(SORT(UNIQUE(FILTER($C$2:$C$11, ($A$2:$A$11=1) )))),
             $A$2:$A$11, 1)
    / COUNTIFS($A$2:$A$11, 1)
    * TRANSPOSE(SORT(UNIQUE(FILTER($C$2:$C$11, ($A$2:$A$11=1) ))))
  )
  *
  (
    COUNTIFS($D$2:$D$11,
                 SORT(UNIQUE(FILTER($D$2:$D$11, ($B$2:$B$11=1)*($A$2:$A$11=1) ))),
             $B$2:$B$11, 1,
             $A$2:$A$11, 1)
    / COUNTIFS($B$2:$B$11, 1, $A$2:$A$11, 1)
    * (50 - SORT(UNIQUE(FILTER($D$2:$D$11, ($B$2:$B$11=1)*($A$2:$A$11=1) ))) )
  )
) * COUNTIFS($B$2:$B$11, 1, $A$2:$A$11, 1)

Result = 537.6

So with that working great and being super fast even on 100,000 rows of data, that brings me to my question:

The above data is simplified. In addition to there being more than two filter columns (easy enough to get around), Var1 and Var2 are, themselves, calculated fields AND need to be filtered up on as well.

I'll just focus on Var1 because the solution will be the same for both: Let's re-write Var1 as: Var1 = MAX(3 - Var1, 0) or to make it more array friendly: Var1 = (Var1<3) * (3-Var1)

If we go back to my larger formula above, that means what I WANT to do is change this:

  (
    COUNTIFS($C$2:$C$11,
                 TRANSPOSE(SORT(UNIQUE(FILTER($C$2:$C$11, ($A$2:$A$11=1) )))),
             $A$2:$A$11, 1)
    / COUNTIFS($A$2:$A$11, 1)
    * TRANSPOSE(SORT(UNIQUE(FILTER($C$2:$C$11, ($A$2:$A$11=1) ))))
  )

to

  (
    COUNTIFS(FILTER(3 - $C$2:$C$11, ($A$2:$A$11=1)*($C$2:$C$11<3) ),
                 TRANSPOSE(SORT(UNIQUE(FILTER(3 - $C$2:$C$11, ($A$2:$A$11=1)*($C$2:$C$11<3) )))),
             $A$2:$A$11, 1)
    / COUNTIFS($A$2:$A$11, 1)
    * TRANSPOSE(SORT(UNIQUE(FILTER(3 - $C$2:$C$11, ($A$2:$A$11=1)*($C$2:$C$11<3) ))))
  )

Result = 153.6 (using helper columns)

Everything works except that first line: COUNTIFS( FILTER(3 - $C$2:$C$11, ($A$2:$A$11=1)*($C$2:$C$11<3) ), ... because obviously COUNTIFS won't accept an array instead of a range. Incidentally, making a range with that formula works, but only if you set it equal to the same # of values as is contained in the full table. In either case, creating a "helper range" isn't practical in my application.

I've tried a variety of combinations of SUM and SUMPRODUCT formulas entered as array-formulas and not, but so far have not found the winning combination.


Solution

  • Since you are doing frequencies I suggest replacing Countifs by Frequency as follows:

    =SUM(
      (
        TRANSPOSE(DROP(FREQUENCY(FILTER(3 - $C$2:$C$11, ($A$2:$A$11=1)*($C$2:$C$11<3) ),
                     (SORT(UNIQUE(FILTER(3 - $C$2:$C$11, ($A$2:$A$11=1)*($C$2:$C$11<3) ))))),-1))
    
        / COUNTIFS($A$2:$A$11, 1)
        * TRANSPOSE(SORT(UNIQUE(FILTER(3 - $C$2:$C$11, ($A$2:$A$11=1)*($C$2:$C$11<3) ))))
      )
      *
      (
        COUNTIFS($D$2:$D$11,
                     SORT(UNIQUE(FILTER($D$2:$D$11, ($B$2:$B$11=1)*($A$2:$A$11=1) ))),
                 $B$2:$B$11, 1,
                 $A$2:$A$11, 1)
        / COUNTIFS($B$2:$B$11, 1, $A$2:$A$11, 1)
        * (50 - SORT(UNIQUE(FILTER($D$2:$D$11, ($B$2:$B$11=1)*($A$2:$A$11=1) ))) )
      )
    ) * COUNTIFS($B$2:$B$11, 1, $A$2:$A$11, 1)
    

    enter image description here

    As mentioned by @P.b, we don't know if you have Excel 2021 or 365. If you can't use Drop to ignore the last element of the array returned by Frequency, you should be able to use Index or in this case filter instead. Using let to avoid repetition:

    =SUM(
      (LET(filter,FILTER(3-$C$2:$C$11,($A$2:$A$11=1)*($C$2:$C$11<3)),
       freq,FREQUENCY(filter,(SORT(UNIQUE(filter)))),
        TRANSPOSE(FILTER(freq,freq>0))
        /COUNTIFS($A$2:$A$11,1)
        *TRANSPOSE(SORT(UNIQUE(filter))))
      )
    *
      (
      COUNTIFS($D$2:$D$11,
       SORT(UNIQUE(FILTER($D$2:$D$11,($B$2:$B$11=1)*($A$2:$A$11=1)))),
         $B$2:$B$11,1,
         $A$2:$A$11,1)
      /COUNTIFS($B$2:$B$11,1,$A$2:$A$11,1)
      *(50-SORT(UNIQUE(FILTER($D$2:$D$11,($B$2:$B$11=1)*($A$2:$A$11=1)))))
      )
     )*COUNTIFS($B$2:$B$11,1,$A$2:$A$11,1
    )