javaregexstring

Java regex : Remove (double) negative look ahead and look behind


I have the following regex that matches a string to pattern:

(?i)(?<![^\\s\\p{Punct}]) : Look behind

(?![^\\s\\p{Punct}]) : Look ahead

Below is an example that demonstrates how I am using it:

public static void main(String[] args) {
    String patternStart = "(?i)(?<![^\\s\\p{Punct}])", patternEnd = "(?![^\\s\\p{Punct}])";
    String text = "this is some paragraph";
    System.out.println(Pattern.compile(patternStart + Pattern.quote("some paragraph") + patternEnd).matcher(text).find());
}

It returns true which is expected result. However, as the regex uses double negative (i.e. negative look ahead/behind and ^), I thought removing both of the negatives should return the same result. So, I tried with the below:

String patternStart = "(?i)(?<=[\\s\\p{Punct}])", patternEnd = "(?=[\\s\\p{Punct}])";

However, it doesn't seem to be working as expected. I even tried adding ^ and/or $ in the end (of the square bracket) to match beginning/end of string, still, no luck.

Is it possible to convert these regexes into positive look-ups?


Solution

  • Yes, it is possible, but it is less efficient than what you have because in the positive lookarounds you need to use alternation:

    String patternStart = "(?i)(?<=^|[\\s\\p{Punct}])", patternEnd = "(?=[\\s\\p{Punct}]|$)";
                                   ^^                                                   ^^ 
    

    The (?<=^|[\\s\\p{Punct}]) lookbehind requires the presence of either start of string (^) or | a whitespace or punctuation symbol ([\\s\\p{Punct}]). The positive lookahead (?=[\\s\\p{Punct}]|$) requires either a whitespace or punctuation, or the end of string.

    If you just add ^ or $ into the character classes like [\\s\\p{Punct}^] and [\\s\\p{Punct}$], they will be parsed as literal caret and dollar symbols.