I'm trying for hours with regex: I need a regex to select all that is inside underlines. Example:
\_italic\_
But with the only condition that I need it to ignore \\_
(backslash followed by underscore).
So, this would be a match (all the text which is inside the \_
):
\_italic some text 123 \\_*%&$ _
SO far I have this regex:
(\_.*?\_)(?!\\\_)
But is not ignoring the \\_
Which regex would work?
You can use
(?s)(?<!\\)(?:\\{2})*_((?:[^\\_]|\\.)+)_
See the regex demo. Details:
(?s)
- an inline embedded flag option equal to Pattern.DOTALL
(?<!\\)(?:\\{2})*
- a position that is not immediately preceded with a backslash and then zero or more sequences of double backslashes_
- an underscore((?:[^\\_]|\\.)+)
- Capturing group 1: one or more occurrences of any char other than a \
and _
, or any escaped char (a combination of a \
and any one char)_
- an underscoreSee the Java demo:
List<String> strs = Arrays.asList("xxx _italic some text 123 \\_*%&$ _ xxx",
"\\_test_test_");
String regex = "(?s)(?<!\\\\)(?:\\\\{2})*_((?:[^\\\\_]|\\\\.)+)_";
Pattern p = Pattern.compile(regex);
for (String str : strs) {
Matcher m = p.matcher(str);
List<String> result = new ArrayList<>();
while(m.find()) {
result.add(m.group(1));
}
System.out.println(str + " => " + String.join(", ", result));
}
Output:
xxx _italic some text 123 \_*%&$ _ xxx => italic some text 123 \_*%&$
\_test_test_ => test