Using jsoup
library I am trying to get the href
of an <a>
element which contains specified text each time.
Example:
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import org.jsoup.select.Elements
public class GlobVars {
public static Document currentPageSource
public static String currentTitle
}
def get_url() {
String url = "https://www.website.com/"
GlobVars.currentPageSource = Jsoup.connect(url).get()
Elements wElements = GlobVars.currentPageSource.select('a[class="class-name"]:contains('+GlobVars.currentTitle+')')
if(wElements) {
/*
* Do stuff...
*
* */
}
}
The problem is when GlobVars.currentTitle
contains single quote character!!! For example, if GlobVars.currentTitle
is I am here
it "works" fine. But if GlobVars.currentTitle
is I'm here
i get this error: Did not find balanced marker at 'I'
.
I tried to use GlobVars.currentTitle
variable with double-quoted
, triple-single-quoted
or triple-double-quoted
but I get the same error.
I also read https://github.com/jhy/jsoup/issues/1105 but the "trick" to escape quotes can not be used in my case.
Any idea how I'll fix this?
// @Grab(group='org.jsoup', module='jsoup', version='1.14.3')
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import org.jsoup.select.Elements
def html = """
<html>
<body>
<a class="c1" href="#1">i'm the one</a>
<a class="c1" href="#2">i am the one</a>
</body>
</html>
"""
def desiredText = "i'm the one"
// escape special chars. maybe you need more special chars to escape...
desiredText = desiredText.replaceAll(/(['"\\\/\|(\)\[\]])/, '\\\\$1')
Document currentPageSource = Jsoup.parse(html)
Elements wElements = currentPageSource.select('a[class="c1"]:contains('+ desiredText +')')
or
def desiredText = "i'm the one"
Elements wElements = currentPageSource.select('a[class="c1"]').findAll{it.html().contains(desiredText)}