I am re-posting this message. I am trying to extract a unordered list. In the previous question I have the format incorrect. This website from where I am trying to extract the data is formatted correctly.
<ul>
<li>
<i>
<a class="mw-redirect" title="title1" href="yahoo.com">used to be a best email</a>
</i>
(1999)
</li>
<li>
<i>
<a title="title2" href="google.com">Best search enginee We Will Go</a>
</i>
(1999)
</li>
<li>
<i>
<a title="title3" href="apple.com">Best Phone</a>
</i>
(1990)
</li>
</ul>
I want to print:
title1
google.com
yahoo.com
= used to be a best email Best search email will go Bestphone
similarly all Hrefs.
I did see the JSOUP documentation.
Related Question: jsoup to get the data in a unorderedlist but that is having format issues.
I tried as suggested but it is not working
I tried:
Document doc = Jsoup.connect(url).get();
Element link = doc.select("a").last();
String title1 = link.attr("title");
Issue is this is a big page with some information. in that there are many unordered lists..
Maybe my answer would be more accurate if you would format and specify your requirements better, is this what you were looking for ?
public static void main(String[] args) throws IOException
{
String html = "<ul><li><i><a class=\"mw-redirect\" title=\"title1\" href=\"yahoo.com\">used to be a best email</a></i>(1999)</li><li><i><a title=\"title2\" href=\"google.com\">Best search enginee We Will Go</a></i>(1999)</li><li><i><a title=\"title3\" href=\"apple.com\">Best Phone</a></i>(1990)</li></ul>";
Document doc = Jsoup.parse(html);
Elements links = doc.select("ul li i a");
for (Element element : links) {
System.out.format("%s %s %s\n", element.attr("title"), element.attr("href"), element.text());
}
}
If not add a sample output section in your question.
Update :
How it works. The ul li i a
is a css selector. Which would mean take every a
element that is located inside i
that is wrapped in li
tags which is wrapped in ul
tags. (Horrible explanation)
You would get the same result from doc.select("a")
as well. But being specific is better since you're parsing this data from some website because links can be in different places with different id/class or whatever and you are looking for these specific ones.
Yes if the selected elemets do have title, hyperlink and text value it will output that data.