javaluceneweb-crawlernutchheritrix

Is it possible to integrate Nutch Crawler with my existing Lucene project?


I have a project using Lucene3.5 already.

Now i need to provide web search function but i don't want to import the whole Nutch project.

So i wonder , may be i can only use the crawler part of Nutch to crawl websites and index them into Lucene style.

Then search the index files with my existing Lucene searcher.

Is it possible to do this or do you have any suggestion (how about Heritrix)?


Solution

  • Yes, it is possible to search the index produced by nutch with your own lucene implementation. I wrote a short description in the wiki of our project, where we use nutch to crawl static content.

    You can have a look at it here: http://code.google.com/p/gtxcontentconnector/wiki/HowTo_Nutch

    br, Chris