[SOLVED] How to collect contact information from websites?

How to collect contact information from websites?

Does anyone know a web crawler tool for collecting contact details from a website? Say I have a www.website/contact.. I want to pull out the address, phone number, etc.. There are 2 tools I've been looking at: cralwer4j opensource jar for java and Scrapy opensource in Python. But I am finding it a bit hard to use for my scenario.

Any suggestions would be great. Thanks

Solution

You might google for "simple web crawler" to find a solution that fits you best. In the net there are plenty "pure python" based web crawlers. Based on sceleton code you add db wrap up. I think the most problem would be db setting and saving data in it.

What if there are 1000000s of websites to crawl.. Is there a way to crawl all websites in my are?

No problem for scripting. Just put millions addresses in a file (or files), open it for reading in python or other script. Then get link by link from it and crawl/scrape to your pleasure. Result you might also want to save in file (csv, json).

I'd also recommend you a ready simple python crawler.