css-selectorsjsoup

JSoup Selector for multiple tags containing a phrase


In JSoup, how do I write a selector that matches an element that can be from multiple tags, and contains a text phrase?

For example, I want to match any header tag that contains "phrase".

This works, but I want to avoid repetition: :has(h1:contains(phrase), h2:contains(phrase), h3:contains(phrase))

This only matches h3's containing the phrase: :has(h1, h2, h3:contains(phrase))


Sorry I didn't specify earlier because I wanted to keep the question simple. :( I need a pure selector solution as I'm actually using jsoup https://jsoup.org/cookbook/extracting-data/selector-syntax which "supports a CSS (or jquery) like selector syntax to find matching elements".


Solution

  • JSoup supports the select(String query) method not only on objects of type Document, but also on objects of type Elements. select(String query) itself returns Elements. Therefore you may concatenate several select statements to filter out what you want:

    Elements hWithText = doc.select("h1,h2,h2").select(":matchesOwn(regEx)");
    

    Of course you can also use select(":contains(whatever)", if you do not need the flexibility of regular expressions.