javaandroidweb-scrapingjsoup

What is the best way to scrape this HTML for an android app?


What is the best way to scrape the below HTML from a web page? I want to pull out Apple, Orange and Grape and put them into a dropdown menu in my Android app. Should I use Jsoup for this, and if so, what would be the best way to do it? Should I use Regex instead?

<select name="fruit" id="fruit" >
<option value="APPLE">Apple</option>
<option value="ORANGE">Orange</option>
<option value="GRAPE">Grape</option>
</select>

Solution

  • Depends, but I'd go with an XML/HTML parser. Don't use regex.

    Example with jsoup:

    Document doc = Jsoup.connect(someUrl).get();
    Elements options = doc.select("select#fruit option");
    

    More on jsoup selector syntax.


    Best way?

    I would go with either the built-in DOM parser or SAX parser. If you're going to be parsing a large document, SAX is faster. If the document is small, then there's not much difference. More on SAX vs DOM.