javaregex

Java Regex, capturing items inside of "[...]"


I am trying to capture text inside of XML tags like ... and the content inside of strings like "[[A]]" that would be inside of the XML tags. So far my patterns are as follows:

    Pattern titleText = Pattern.compile("<title>([A-Z])</title>");
    Pattern extractLink = Pattern.compile("(\[\[([A-Z])\]\])");

I'm getting an error on the second pattern, and it's because of the \s. However, I'm not sure how to let Regex know that I want to escape the [s and ]s so it captures the text inside of them.

An example of input I am trying to capture is:

<title>random text [[A]] more random text [[B]] ...</title>

Where [[A]] and [[B]] can happen any number of times, and I am trying to find all of them.

Any help/advice would be greatly appreciated.


Solution

  • You can't extract a regex group in Java an arbitrary number of times without specifying each one in the pattern. However, here is an alternative solution which splits the String on the bracketed item you want to match:

    Pattern titleText = Pattern.compile("<title>(.*?)</title>");
    String input = "<title>random text [[A]] more random text [[B]] ...</title>";
    String text = "";
    
    Matcher m = titleText.matcher(input);
    if (m.find( )) {
        text = m.group(1);
    }
    
    String[] parts = text.split("\\[\\[");
    
    for (int i=1; i < parts.length; ++i) {
        int index = parts[i].indexOf("]]");
        String match = parts[i].substring(0, index);
        System.out.println("Found a match: " + match);
    }
    

    Output:

    Found a match: A
    Found a match: B