javacommentsexcel-2013crawler4jjericho-html-parser

How to retrieve all the user comments from a site?


I want all the user comments from this site : http://www.consumercomplaints.in/?search=chevrolet

The problem is the comments are just displayed partially, and to see the complete comment I have to click on the title above it, and this process has to be repeated for all the comments.

The other problem is that there are many pages of comments.

So I want to store all the complete comments in an excel sheet from the above site specified. Is this possible ? I am thinking of using crawler4j and jericho along with Eclipse.

My code for visitPage method: @Override public void visit(Page page) {
String url = page.getWebURL().getURL(); System.out.println("URL: " + url);

           if (page.getParseData() instanceof HtmlParseData) {
                   HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();

                   String html = htmlParseData.getHtml();

  //               Set<WebURL> links = htmlParseData.getOutgoingUrls();
  //               String text = htmlParseData.getText();

                   try
                   {
                       String CrawlerOutputPath = "/DA Project/HTML Source/";
                       File outputfile = new File(CrawlerOutputPath);

                       //If file doesnt exists, then create it
                        if(!outputfile.exists()){
                            outputfile.createNewFile();
                        }

                       FileWriter fw = new FileWriter(outputfile,true);  //true = append file
                       BufferedWriter bufferWritter = new BufferedWriter(fw);
                       bufferWritter.write(html);
                       bufferWritter.close();
                       fw.write(html);
                       fw.close();

                   }catch(IOException e)
                   {
                       System.out.println("IOException : " + e.getMessage() );
                       e.printStackTrace();
                   }

                   System.out.println("Html length: " + html.length());
           }
   }

Thanks in advance. Any help would be appreciated.


Solution

  • Yes it is possible.

    Hope this helps you