Does anyone know a web crawler tool for collecting contact details from a website? Say I have a www.website/contact.. I want to pull out the address, phone number, etc.. There are 2 tools I've been looking at: cralwer4j opensource jar for java and Scrapy opensource in Python. But I am finding it a bit hard to use for my scenario.
Any suggestions would be great. Thanks
You might google for "simple web crawler" to find a solution that fits you best. In the net there are plenty "pure python" based web crawlers. Based on sceleton code you add db wrap up. I think the most problem would be db setting and saving data in it.
What if there are 1000000s of websites to crawl.. Is there a way to crawl all websites in my are?
No problem for scripting. Just put millions addresses in a file (or files), open it for reading in python or other script. Then get link by link from it and crawl/scrape to your pleasure. Result you might also want to save in file (csv, json).
I'd also recommend you a ready simple python crawler.