I am trying to get the main content of an article from an HTML using boilerpipe code.
Downloaded the latest jars from here.
I am trying to use the following code:
String article = "";
try {
article = ArticleExtractor.INSTANCE.getText(url);
System.out.println("Article ++++ >>" + article);
} catch (BoilerpipeProcessingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
But this returns an empty string for every URL. Can anyone help me on this?
Have you tried to pass the HTML itself instead of the url? Or maybe there is a problem with the way your url strings are formatted.