regexgoogle-sheetsgoogle-sheets-formulare2

Get a match when there are duplicate letters in a string


I have a list of inputs in google sheets,

Input Desired Output "To demonstrate only not an input" The repeated letters
Outdoors Match o
dog No Match
step No Match
bee Match e
Chessboard Match s
Cookbooks Match o, k

How do I verify if all letters are unique in a string without splitting it?

In other words if the string has one letter or more occurred twice or more, return TRUE

My process so far

I tried this solution in addition to splitting the string and dividing the length of the string on the COUNTA of unique letters of the string, if = 1 "Match", else "No match"

Or using regex
I found a method to match a letter is occure in a string 2 times this demonstration with REGEXEXTRACT But wait what needed is get TRUE when the letters are not unique in the string

=REGEXEXTRACT(A1,"o{2}?")

Returns:

oo

Something like this would do

=REGEXMATCH(Input,"(anyletter){2}?")

OR like this

=REGEXMATCH(lower(A6),"[a-zA-Z]{2}?")

Notes

More examples

Input Desired Output
Professionally Match
Attractiveness Match
Uncontrollably Match
disreputably No Match
Recommendation Match
Interrogations Match
Aggressiveness Match
doublethinks No Match

Solution

  • You are explicitly asking for an answer using a single regular expression. Unfortunately there is no such thing as a backreference to a former capture group using RE2. So if you'd spell out the answer to your problem it would look like:

    =INDEX(IF(A2:A="","",REGEXMATCH(A2:A,"(?i)(?:a.*a|b.*b|c.*c|d.*d|e.*e|f.*f|g.*g|h.*h|i.*i|j.*j|k.*k|l.*l|m.*m|n.*n|o.*o|p.*p|q.*q|r.*r|s.*s|t.*t|u.*u|v.*v|w.*w|x.*x|y.*y|z.*z)")))
    

    Since you are looking for case-insensitive matching (?i) modifier will help to cut down the options to just the 26 letters of the alphabet. I suppose the above can be written a bit neater like:

    =INDEX(IF(A2:A="","",REGEXMATCH(A2:A,"(?i)(?:"&TEXTJOIN("|",1,REPLACE(REPT(CHAR(SEQUENCE(26,1,65)),2),2,0,".*"))&")")))
    

    EDIT 1:

    The only other reasonable way to do this (untill I learned about the PREG supported syntax of the matches clause in QUERY() by @DoubleUnary) with a single regex other than the above is to create your own UDF in GAS (AFAIK). It's going to be JavaScript based thus supporting a backreferences. GAS is not my forte, but a simple example could be:

    function REGEXMATCH_JS(s) {
      if (s.map) {
        return s.map(REGEXMATCH_JS);
      } else {
        return /([a-z]).*?\1/gi.test(s);
      }
    }
    

    The pattern ([a-z]).*?\1 means:

    The match is global and case-insensitive. You can now call:

    =INDEX(IF(A2:A="","",REGEXMATCH_JS(A2:A)))
    

    EDIT 2:

    For those that are benchmarking speed, I am not testing this myself but maybe this would speed things up:

    =INDEX(REGEXMATCH(A2:INDEX(A:A,COUNTA(A:A)),"(?i)(?:a.*a|b.*b|c.*c|d.*d|e.*e|f.*f|g.*g|h.*h|i.*i|j.*j|k.*k|l.*l|m.*m|n.*n|o.*o|p.*p|q.*q|r.*r|s.*s|t.*t|u.*u|v.*v|w.*w|x.*x|y.*y|z.*z)"))
    

    Or:

    =INDEX(REGEXMATCH(A2:INDEX(A:A,COUNTA(A:A)),"(?i)(?:"&TEXTJOIN("|",1,REPLACE(REPT(CHAR(SEQUENCE(26,1,65)),2),2,0,".*"))&")")) 
    

    Or:

    =REGEXMATCH_JS(A2:INDEX(A:A,COUNTA(A:A)))
    

    Respectively. Knowing there is a header in 1st row.