regex

Regex: Match specific characters in any order without more occurrences of each character than specified


I have a list of characters, e.g. {o, b, c, c, d, o, f}.

If a string contains characters that are not in that list, I don't want it to be a match. If a string contains more occurrences of a character than there are occurrences of that character in that list, I don't want it to be a match.

The characters in the string may occur in any order, and all characters don't have to appear. In the above example "foo" should be a match but not "fooo".

I have for instance narrowed the above example down to (o{0,2}b?c{0,2}d?f?), but that doesn't quite work since the order matters in that regex. I get a match for "oof" but not for "foo".


Solution

  • As gview says, regex is not the right tool. However, if your regex engine supports lookahead, you can use this:

    ^(?=(?:[^o]*o){0,2}[^o]*$)(?=(?:[^c]*c){0,2}[^c]*$)(?=[^b]*b?[^b]*$)(?=[^d]*d?[^d]*$)(?=[^f]*f?[^f]*$)[obcdf]+$
    

    Its a bit long but very simple:

    The string is matched with ^[obcdf]+$ (note the use of anchors).

    The lookaheads (?=...) are only checks (followed by):

    (?=(?:[^o]*o){0,2}[^o]*$)   # no more than 2 o until the end
    
    (?=[^b]*b?[^b]*$) # no more than 1 b until the end
    

    Each subpattern in lookaheads describes the whole string.