google-sheetsgroupingunique-values

Using Google Sheets, identify a grouping of rows that have a unique column value


I'm trying to find a way to solve a problem like this: Given the following data in a Google Sheet:

Recipe       Ingredient   Sequence.   Waste
Chicken Soup Celery.      1.          5%
Chicken Soup Carrots.     2.          3%
Chicken Soup Stock.       3.          0%
Chicken Soup Chicken.     4.          0%
Beef Stew.   Beef.        1.          0%
Beef Stew.   Stock.       2.          0%
Beef Stew.   Spices.      3.          0%

This data set contains two recipes, with ingredients in a sequence, where each ingredient has a waste percentage.

I'd like to write a formula (or whatever) that will work across hundreds or more of recipes. It should:

So my final output would show recipe 1 (Chicken Soup) and all of its ingredients, but ignore recipe 2 (Beef Stew) because the ingredients for chicken soup have different waste values, as opposed to all the same values in Beef stew.

I've been looking at various UNIQUE functions, but can't quite get the range to operate across multiple rows when there are hundreds of recipes.

I've worked out logic:


Solution

  • In case you need an option without REGEXMATCH, because of the possibility of having similar recipes with matching names (say for example "Chicken Soup" and "Spicy Chicken Soup" would be both gotten if Chicken Soup has differing values in percentages in only "Chicken Soup"), you could apply a filter like this:

    =FILTER(A3:D,
      BYROW(A3:A,LAMBDA(each,
        COUNTIF(INDEX(REGEXEXTRACT(INDEX(UNIQUE(A3:A&"╬"&D3:D)),"(.+)╬")),each)))>1
    )
    

    If you have a lot of data, you can try this that first filters the empty values and should be faster:

    =LAMBDA(filtered,FILTER(filtered,
      BYROW(filtered,LAMBDA(each,
        COUNTIF(INDEX(REGEXEXTRACT(INDEX(UNIQUE(A3:A&"╬"&D3:D)),"(.+)╬")),index(each,1,1))))>1
    ))(FILTER(A3:D,A3:A<>""))
    

    enter image description here