regexoption-typecapturing-group

Regex optional capturing group?


After hours of searching I decided to ask this question. Why doesn't this regular expression ^(dog).+?(cat)? work as I think it should work (i.e. capture the first dog and cat if there is any)? What am I missing here?

dog, cat
dog, dog, cat
dog, dog, dog

Solution

  • The reason that you do not get an optional cat after a reluctantly-qualified .+? is that it is both optional and non-anchored: the engine is not forced to make that match, because it can legally treat the cat as the "tail" of the .+? sequence.

    If you anchor the cat at the end of the string, i.e. use ^(dog).+?(cat)?$, you would get a match, though:

    Pattern p = Pattern.compile("^(dog).+?(cat)?$");
    for (String s : new String[] {"dog, cat", "dog, dog, cat", "dog, dog, dog"}) {
        Matcher m = p.matcher(s);
        if (m.find()) {
            System.out.println(m.group(1)+" "+m.group(2));
        }
    }
    

    This prints (demo 1)

    dog cat
    dog cat
    dog null
    

    Do you happen to know how to deal with it in case there's something after cat?

    You can deal with it by constructing a trickier expression that matches anything except cat, like this:

    ^(dog)(?:[^c]|c[^a]|ca[^t])+(cat)?
    

    Now the cat could happen anywhere in the string without an anchor (demo 2).