web-crawlercrawler4j

How to crawl my site to detect 404/500 errors?


Is there any fast (maybe multi-threaded) way to crawl my site (clicking on all local links) to look for 404/500 errors (i.e. ensure 200 response)?

I also want to be able to set it to only click into 1 of each type of link. So if I have 1000 category pages, it only clicks into one.

Is http://code.google.com/p/crawler4j/ a good option?

I'd like something that is super easy to set up, and I'd prefer PHP over Java (though if Java is significantly faster, that would be ok).


Solution

  • You can use the old and stable Xenu tool to crawl your site.

    You can configure him to use 100 threads and sort the results by status code[500\404\200\403]