javamatcherioutils

java.util.regex.Matcher is unable to find anything inside a String obtained from Apache IOUtils


I have a code snippet to convert an input stream into a String. I then use java.util.regex.Matcher to find something inside the string.

The following works for me:

StringBuilder sb = new StringBuilder();
InputStream ins; // the InputStream data
BufferedReader br = new BufferedReader(new InputStreamReader(ins));
br.lines().forEach(sb::append);
br.close();

String data = sb.toString();
Pattern pattern = Pattern.compile(".*My_PATTERN:(.*)");
Matcher matcher = pattern.matcher(data);
if (matcher.find())
   String searchedStr = matcher.group(1); // I find a match here

But if I try to replace BufferedReader with Apache IOUtils, I do not find any matches with the same string.

InputStream ins; // the InputStream data
String data = IOUtils.toString(inputStream, StandardCharsets.UTF_8);

Pattern pattern = Pattern.compile(".*My_PATTERN:(.*)");
Matcher matcher = pattern.matcher(data);
if (matcher.find())
   String searchedStr = matcher.group(1); // I cannot find a match here

I have tried with other "StandardCharsets" apart from UTF-8 but none have worked.

I am unable to understand what is different here that would cause IOUtils to not match. Can someone kindly help me out here?


Solution

  • The first code removes line brakes, the second doesn't.

    So you should define multiline pattern matching:

    1. In the pattern (starting with flags s=dotall, m=multiline)
    Pattern pattern = Pattern.compile("(?sm).*My_PATTERN:(.*)");
    1. In the pattern v2
    Pattern pattern = Pattern.compile("[\\s\\S]*My_PATTERN:([\\s\\S]*)");
    1. With flags
    Pattern pattern = Pattern.compile(".*My_PATTERN:(.*)", MULTILINE|DOTALL);

    All matches line brakes in the group's value.
    Or remove line breaks ie:

    data = data.replaceAll("\\r?\\n", "");

    See: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#compile(java.lang.String,%20int)

    https://docs.oracle.com/javase/tutorial/essential/regex/pattern.html