javahtmlweb-scrapingjsoupwebsphere-portal

JSOUP Web scraping from support portal


I'm new in using jSoup and now I'm trying to make a web scraping form this portal.

https://supportforums.cisco.com/t5/lan-switching-and-routing/bd-p/6016-discussions-lan-switching-routing

On this portal, I want to receive information from this list, which will show solved problems, I mean the topics which have the special image of solving like this.

Solved task must look in such way

I created a connection to this page in such way and checked the title of this page to be sure that I'm in the right place.

        document = Jsoup.connect("https://supportforums.cisco.com/t5/lan-switching-and-routing/bd-p/6016-discussions-lan-switching-routing").get();
        String title = document.title();
        print("Title: " + title);

After that I began to look into HTML side and i understood that this topics must be element in list inside div class messageList.MessageList.lia-component-forums-widget-message-list.lia-forum-message-list.lia-component-message-list but I'm not sure about it. Then I figured out that each topic contain unique id and I'm stuck on it.

Could you please help me how to receive all these elements, topics? And how to filter solved topics among all of them? In the beginning, I just want to output the titles of these topics using Console in Java.

And sorry if I asked a silly question.


Solution

  • The topics that are solved are represented by row with class lia-list-row-thread-solved. The main thread list is in element with id grid.

            Document doc = Jsoup.connect(
                    "https://supportforums.cisco.com/t5/lan-switching-and-routing/bd-p/6016-discussions-lan-switching-routing")
                    .get();
            for (Element e : doc.select("#grid tr.lia-list-row-thread-solved")) {
                String text = e.text();
                System.out.println(text);
            }