javaregex

java pattern compile regex


String HTML = some HTML source code that contains String a,b

String a = "<a class="cit-dark-link" href="http://scholar.google.ca/scholar?oi=bibs&hl=en&cites=6912391300348162186">88</a>"

String b = "<a class="cit-dark-link" href="http://scholar.google.ca/scholar?oi=bibs&hl=en&cites=18217435431424551679">41</a>"

String ex = ?

Pattern patternObject = Pattern.compile(ex);
Matcher matcherObject = patternObject.matcher(HTML);

while (matcherObject.find()) {
        System.out.println("DEBUG: Cite is " + matcherObject.group(1));
  }

Hi, I am new to JAVA and Regex and I am wondering how can I write the String ex so that it only prints. (I hope I am clear enough)

Cite is 88

Cite is 41


Solution

  • You can try this :

    Pattern patternObject = Pattern.compile("<a class=\"cit-dark-link(.*?)cites=(\\d)+\">(.*?)</a>");
                Matcher matcherObject = patternObject.matcher(HTML);
    
                while (matcherObject.find()) {
                        System.out.println("DEBUG: Cite is " + matcherObject.group(3));
                  }
    

    This prints :

    DEBUG: Cite is 88
    DEBUG: Cite is 41