[SOLVED] Does any open, simply extendible web crawler exists?

Does any open, simply extendible web crawler exists?

I search for a web crawler solution which can is mature enough and can be simply extended. I am interested in the following features... or possibility to extend the crawler to meet them:

partly just to read the feeds of several sites
to scrape the content of these sites
if the site has an archive I would like to crawl and index it as well
the crawler should be capable to explore part of the Web for me and it should be able to decide which sites matches the given criteria
should be able to notify me, if things possibly matching my interest were found
the crawler should not kill the servers by attacking it by too many requests, it should be smart doing crawling
the crawler should be robust against freak sites and servers

Those things above can be done one by one without any big effort, but I am interested in any solution which provide a customisable, extendible crawler. I heard of Apache Nutch, but very unsure about the project so far. Do you have experiences with it? Can you recommend alternatives?

Solution

A quick search on GitHub threw up Anemone, a web spider framework which seems to fit your requirements - particularly extensiblility. Written in Ruby.
Hope it goes well!