web-crawlerstormcrawler

completion event of crawling all of the sub URLs for specific base URL in Storm Crawler


I am currently working on Storm Crawler based project. I need to do some processing after the completion event of the crawling of all sub URLs for that base URL. For example, I want to change a status when all of the discovered URLs for that domain crawled successfully or with an error. How can I find a finishing event for each Base URL?


Solution

  • Not out of the box, no. you would have to implement a mechanism to check whether there are unfetched URLs left for a given key yourself.