What is the best way to scrape the below HTML from a web page? I want to pull out Apple, Orange and Grape and put them into a dropdown menu in my Android app. Should I use Jsoup for this, and if so, what would be the best way to do it? Should I use Regex instead?
<select name="fruit" id="fruit" >
<option value="APPLE">Apple</option>
<option value="ORANGE">Orange</option>
<option value="GRAPE">Grape</option>
</select>
Depends, but I'd go with an XML/HTML parser. Don't use regex.
Example with jsoup:
Document doc = Jsoup.connect(someUrl).get();
Elements options = doc.select("select#fruit option");
More on jsoup selector syntax.
I would go with either the built-in DOM parser or SAX parser. If you're going to be parsing a large document, SAX is faster. If the document is small, then there's not much difference. More on SAX vs DOM.