excelexcel-formulacountifexcel-2016

Excel: Counting cells which contain one (or more) specific strings


I'm using Excel 2016, and I have a dataset with a large number of strings, and I need to find out how many cells in a given range ('Geotech Data'!G:G in this case) there are that contain one of 29 different strings (CPT, DCP, Soakage, Clegg, Seismicity, etc.).

I need to find a way to count the number of cells where at least one of the strings is present, ensure that there is no double-counting (there are several occasions where more than one of the strings is present in a single cell), and I need to be able to add/remove strings to the list of search terms at a later date as the program I'm plugging data into changes what data types are and are not admissible.

I've already manually written out a formula for Soakage, CPT, DCP and Clegg:

=(COUNTIF('Geotech Data'!G:G,"*CPT*")+COUNTIF('Geotech Data'!G:G,"*Clegg*")+COUNTIF('Geotech Data'!G:G,"*Soakage*")+COUNTIF('Geotech Data'!G:G,"*DCP*"))-((COUNTIFS('Geotech Data'!G:G,"*CPT*",'Geotech Data'!G:G,"*Clegg*")+COUNTIFS('Geotech Data'!G:G,"*CPT*",'Geotech Data'!G:G,"*Soakage*")+COUNTIFS('Geotech Data'!G:G,"*CPT*",'Geotech Data'!G:G,"*DCP*") + COUNTIFS('Geotech Data'!G:G,"*Clegg*",'Geotech Data'!G:G,"*Soakage*")+COUNTIFS('Geotech Data'!G:G,"*Clegg*",'Geotech Data'!G:G,"*DCP*")+COUNTIFS('Geotech Data'!G:G,"*Soakage*",'Geotech Data'!G:G,"*DCP*"))-((COUNTIFS('Geotech Data'!G:G,"*CPT*",'Geotech Data'!G:G,"*Clegg*",'Geotech Data'!G:G,"*Soakage*")+COUNTIFS('Geotech Data'!G:G,"*CPT*",'Geotech Data'!G:G,"*Clegg*",'Geotech Data'!G:G,"*DCP*")+COUNTIFS('Geotech Data'!G:G,"*Clegg*",'Geotech Data'!G:G,"*Soakage*",'Geotech Data'!G:G,"*DCP*"))-COUNTIFS('Geotech Data'!G:G,"*CPT*",'Geotech Data'!G:G,"*Clegg*",'Geotech Data'!G:G,"*Soakage*",'Geotech Data'!G:G,"*DCP*")))

But given that the double-ups alone would require 29! (3.04883E+29) different COUNTIFS statements, it's obviously not feasible.

The information in question is in a pivot table, and will routinely be added to as new information becomes available to my organisation. I've already tried variations of: IF(ISNUMBER(SEARCH())) IF(SUMPRODUCT(--(NOT(ISERR(SEARCH()))))) IF(COUNT(SEARCH())) =ARRAYFORMULA(SUMPRODUCT(((ISNUMBER(SEARCH("CPT",'Geotech Data'!G:G))+ISNUMBER(SEARCH("DCP",'Geotech Data'!G:G))+ISNUMBER(SEARCH("Clegg",'Geotech Data'!G:G))+ISNUMBER(SEARCH("Soakage",'Geotech Data'!G:G)))>0)*1)) (The last one crashed my Excel when I tried to step through it to find where it went wrong)

and CONCATENATE is also not an option, as the dataset is almost 1000x11 already.


Solution

  • Caveat: this answer relies on using the SUBSTITUTE function with an array. Not having access to Excel 2016, I cannot confirm if this functionality was available then or not. For a quick check, use the formula =SUBSTITUTE("ABCD",{"A","B";"C","D"},""), which should output a 2-by-2 square of values if SUBSTITUTE accepts arrays in your version of Excel


    To make this easier, I am going to recommend (and, therefore, assume) that the strings for which you want to search are stored in your workbook, as a column. I am then going to assume that this column has been turned into a Named Range, called rngSearchStrings.

    Next, I am going to recommend (and, therefore, assume) that you set up a Dynamic Named Range for 'Geotech Data'!G:G that limits it to only the Used Range of the Worksheet (something like ='Geotech Data'!$G$1:INDEX('Geotech Data'!$G:$G, COUNTA('Geotech Data'!$G:$G))), and called it rngDataToSearch. This will greatly speed up any function you are trying to run on the data.

    If we then change rngSearchStrings from a Column to a Row with the TRANSPOSE function, we can produce a grid of results with SUBSTITUTE, like so:

    =SUBSTITUTE(rngDataToSearch, TRANSPOSE(rngSearchStrings), "")
    

    This will be the same number of rows as rngDataToSearch, and 27 columns wide (i.e. one for each entry in rngSearchStrings). We can check if this is changed (i.e. the search string was found) like normal:

    =SUBSTITUTE(rngDataToSearch, TRANSPOSE(rngSearchStrings), "")<>rngDataToSearch
    

    At this point, you are looking to run an OR statement on each Row, and count the TRUE results. Well, a double-negation will convert the Boolean True/False values into 1s and 0s, and then you just need to run a row-wise summation:

    =MMULT(--(SUBSTITUTE(rngDataToSearch, TRANSPOSE(rngSearchStrings), "")<>rngDataToSearch), --(rngSearchStrings<>""))
    

    This will collapse your 27-columns-wide table into a single column that counts how many of your Search Strings are in each row of Column G.

    All you need to do is add up how many of those values are greater than zero:

    =SUM(--(MMULT(--(SUBSTITUTE(rngDataToSearch, TRANSPOSE(rngSearchStrings), "")<>rngDataToSearch), --(rngSearchStrings<>""))>0))