I want to use Boilerpipe to extract text from a newspage on several website, the problem is that every time I try it, I get a ConnectionException error. I just used the example syntax from the boilerpipe quickstart guide :
URL url = new URL("http://www.telegraph.co.uk/news/health/11523739/Nine-in-10-GPs-say-no-to-seven-day-opening.html");
String text = ArticleExtractor.INSTANCE.getText(url);
And here is the connection error :
de.l3s.boilerpipe.BoilerpipeProcessingException: java.net.ConnectException: Connection refused: connect
at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:89)
at extract.Test.main(Test.java:14)
Caused by: java.net.ConnectException: Connection refused: connect
I tried with a lot of sites but it comes to the same error.
How can I solve this problem, or at least see where is the problem ? (maybe a firewall, or port configurations...)
After further researchs, I found out that it was a firewall in the enterprise which blocked those requests.