arraysgoogle-sheetscountsequencearray-formulas

How to use an ARRAYFORMULA in Google Sheets that references cells in the same column


After getting sick and tired of having to copy formulas back into my sheet anytime I needed to add a row (one of my gripes of Google Sheets where Excel is much better). I've decided to try using ARRAYFORMULA in row 2 of all my sheets to basically make column formulas. Google Support pages suggests this is an exact replacement for the functionality in Excel - it's not). Note that I don't think either Excel or Google does Column formulas well - but Excel definitely does it better then Google Sheets in this case.

Background

Just using ARRAYFORMULA with a known range works well anytime I add a row in the MIDDLE of that range. However, it doesn't work well when I add a new row to the end of my range that I want to be included. I have to manually change the last row in my ARRAYFORMULA formula if I add a row to the end, or I have to make my last row a "dummy" row with a note that says - don't add new rows, always add to the middle and hope other people using the sheet (or even myself) remember to abide by it. Using large sheets with lots of data, one person not following the rule can majorly screw it up for everyone sharing it. I like to have as much automated as possible to minimize costly mistakes.

I tried using ARRAYFORMULA using whole columns (e.g. A:A, B:B, etc.) but if it's a formula where I need a result output to each row (simple example: = ARRAYFORMULA ( C:C - 1), I get an #N/A result in the cell and the following error text:

Result was not automatically expanded, please insert more rows

UPDATE: This error was because the formula was in row 2 and therefore full columns (A:A, B:B, C:C, G:G) were always one more row than what was available in the sheet. Using C$2:C ($ before 2 is necessary), G$2:G, etc. solves that issue.

My workaround for that was to add a cell in a hidden column on my sheets with the following formula:

= ARRAYFORMULA( MAX( IF( LEN(A:A), ROW(A:A), ) ) )

Note: Whole columns work here because I'm using the MAX function which then returns a single value.

I then name that cell something to the effect of last_XXXX_row where XXXX is a short version of the name of the sheet so I have a constant I can reference and know what the last active row of the sheet is. Then I protect the cell and hide it.

It gets a little annoying as now I have to use INDIRECT everywhere and the formulas get long, but for the most part it works. For example:

 = ARRAYFORMULA( ( $C$2:INDIRECT( "$C$" & last_unit_row) = 1 ) )

on my "unit" sheet returns TRUE or FALSE based on whether the value in column C is equal to 1 or not and returns the corresponding result in each row of the column I put this in. It's kind of long, but now at least I don't have to enter the formula in every row and then re-enter the formula every time I add a row - whether in the middle or the end of the sheet, it automatically updates the column as I add them. Yay.

NOTE: Logic wise, using $C$2:$C works and is a much shorter equation. However, I discovered that as you add data, it bogs the spreadsheet down significantly (and it's even slower without the $) - so I still recommended using indirect as per my example above, which works much faster.

Issue

Some formulas do not work as a direct analog when using ARRAYFORMULA. For instance, I've learned that the INDEX function inside of ARRAYFORMULA prevents ARRAYFORMULA from executing on the entire array, so have to avoid that. There's probably a few others I haven't tried yet.

My particular issue is in a column that needs to know something in the column above it. In both Excel and Google Sheets I often use a count up / reset column to track how many entries there are in a given category. For example, such a formula in column B dependent on a category value in column G typically looks like this:

= IF (G2 <> G1, 0, B1 + 1)

Then when I fill down with that formula, it automatically changes all cell references to the needed rows. It's checking a category label in column G - and if that label changes, it resets to 0 (sometimes I reset to 1, depending), otherwise it increments the value in column B. This is helpful when there isn't a uniform number of entries for each category and each entry needs a subindex.

I can't seem to get this to work using ARRAYFORMULA.

Attempted Solutions

I tried this:

= ARRAYFORMULA( IF( $G2:INDIRECT( "$G$" & last_item_row ) <> $G1:INDIRECT( "$G$" & ( last_item_row - 1 ) ), 0, $B1:INDIRECT( "$B$" & ( last_item_row -1 ) ) ) )

And I get a #REF result in the cell with the error text:

Circular Dependency Detected. To resolve with iterative calculation, see File > Settings

So... it sort of makes sense as it appears that there's a reference to the cell the formula is in inside the range that's created by INDIRECT. However, if the formula was executed properly, it's would always calculate based on the cell ABOVE it and never actually use its own cell as part of the calculation.

If I could use INDEX instead of INDIRECT I ought to be able to avoid this, but I can't.

UPDATE: This formula is basically correct mathematically:

= ARRAYFORMULA ( IF( $G$1:INDIRECT( "$G$" & ( last_item_row - 1 ) ) <> $G$2:INDIRECT( "$G$" & ( last_item_row ) ), 0, ($B$1:INDIRECT( "$B$" & ( last_item_row - 1 ) ) + 1 ) ) )

However, it requires that Iterative calculations are turned on, and it has a maximum value it will "max out" at based on the number of iterations allowed - and there are diminishing returns as the number of iterations is increased. At 100 iterations, it maxes at 10 - my real data has some categories that have 25 sub indices and the spreadsheet gets slower to calculate as iterations goes up, so this isn't a viable solution.

Am I overcomplicating this? Is there a simpler solution I'm not seeing? I'm trying to us COUNTIF as well [ Non-Array version of the formula that works when filled down: =COUNTIF($G$1:$G1,$G2) ], but haven't gotten it to work.

Closest ARRAYFORMULA version I have is this:

=ARRAYFORMULA( COUNTIF($G$1:($G1:INDIRECT( "$G$" & ( last_item_row - 1 ) ) ), $G2:INDIRECT( "$G$" & last_item_row ) ) )

I'm surprised that even worked at all - it returns array values, but it gets me the total number of times that category appears in every row, instead of just the ones leading up to that row.

Example

Sub Index Example

The example above, which uses the formula = if( B2<>B1, 0, A1 + 1 ) in Cell A2 and filled down to cell A13, shows example input (Category) and the desired output (Sub Index). With this formula, however, if I add to cell B14, A14 will not populate unless I copy and paste or fill the formula down to the next row. I want an ARRAYFORMULA in Cell A2 that will automatically fill the cells below it when I add additional data in column B (whether below or by adding a row in between) without having to touch the formula again.


Solution

  • I tried using ARRAYFORMULA using whole columns (e.g. A:A, B:B, etc.) but if it's a formula where I need a result output to each row

    you can always freeze it like:

    =INDIRECT("A:A")
    

    this way you can add rows anywhere you want (if you of course not add new row above the row that holds the formula - that would be troublesome to fit in A:A into A2:A)


    that the INDEX function just does not work with ARRAYFORMULA at all

    INDEX is already ARRAYFORMULA type of formula. the analogy being here as: you need a car to get from A to B where INDEX is a blue car with 3 doors and ARRAYFORMULA is a red car with 5 doors - it doesn't matter what color you have, you just need a car


    = IF (G2 <> G1, 0, B1 + 1)

    while this is direct logic there are several ways how to achieve the same thing. proper usage would require not to use such formulas as ARRAYFORMULA in column G or B to avoid circular dependency errors. for a simple resetting count up try:

    =ARRAYFORMULA(COUNTIFS(B1:B7, B1:B7, 
     SEQUENCE(ROWS(B1:B7)),"<="&SEQUENCE(ROWS(B1:B7))))
    

    enter image description here

    feel free to change B1:B7 to open range or frozen range...


    update:

    =INDEX(IF(B2:B="";; COUNTIFS(B2:B; B2:B; 
     SEQUENCE(ROWS(B2:B)); "<="&SEQUENCE(ROWS(B2:B)))-1))
    

    0