javaregexregex-lookaroundsregex-groupmultiple-matches

How can I get multiple Java regex matches on only certain lines


There is an API that I'm calling which I cannot change. That is, I cannot do this as two sequential regexes or anything like that. The API is written something like this (simplified, of course):

void apiMethod(final String regex) {
    final String input = 
        "bad:    thing01, thing02, thing03 \n" +
        "good:   thing04, thing05, thing06 \n" +
        "better: thing07, thing08, thing09 \n" +
        "worse:  thing10, thing11, thing12 \n";

    final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);

    final Matcher matcher = pattern.matcher(input);

    while (matcher.find()) {
        System.out.println(matcher.group(1));
    }
}

I invoke it something like this:

apiMethod("(thing[0-9]+)");

I want to see six lines printed out, one for each thing 04 through 09, inclusive. I have not been successful so far. Some things I have tried that did not work:

And many more, too numerous to list. I've tried various look-behinds, to no avail.

What I want is all the strings that match "thing[0-9]+" but only those from lines that begin with "good:" or "better:".

Or, stated more generally, I want multiple matches from a multiline pattern but only from lines with a certain prefix.


Solution

  • You have to use a \G based pattern (in multiline mode):

    (?:\G(?!^),|^(?:good|better):)\s*(thing[0-9]+)
    

    The \G anchor forces matches to be contiguous since it matches the position after the last successful match.


    If lines are short, you can also do that using a limited variable-length lookbehind:

    (?<=^(?:good|better):.{0,1000})(thing[0-9]+)