javasslhttpscrawler4j

crawl https pages with crawler4j


For months now we used crawler4j to crawl a https site. Suddenly, since last friday, we're not able to crawl the very same https site. Has something changed in the https-protocol? The site is https://enot.publicprocurement.be/enot-war/home.do

As a test, just try to grab the title: Welkom op het platform e-Notification

Any help is much appreciated.


Solution

  • I had the same issue. To fix this we need a customized PageFetcher. You can find the sample here. http://code.google.com/p/crawler4j/issues/detail?id=174