I'm new to Nutch (2.2.1) and trying to run it on Cygwin/Windows 7 with the latest version of Gora (0.5) so I can persist data to a MongoDB (2.6) datastore. I changed the Nutch-Site.XML File to include my Mongo property but I'm a little confused about the gora-mongodb.mapping.XML properties file here that's needed. Just wondering do I need to:
1) create a Java class within the Nutch/Gora project which I specify in class-name property in the gora-mongodb.mapping File or will Gora create this for me? The documentation doesn't appear to be very clear.
2) I created a sample File in my apache-nutch-2.2.1\runtime\local\conf folder and added the name of my MongoDB collection. When I run Nutch I get the following error:
$ ./nutch crawl urls -dir testCrawl -depth 3 -topN 5
cygpath: can't convert empty path
Exception in thread "main" org.apache.gora.util.GoraException: java.lang.IllegalStateException: A collection is not specified
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
Caused by: java.lang.IllegalStateException: A collection is not specified
at org.apache.gora.mongodb.store.MongoMappingBuilder.build(MongoMappingBuilder.java:77)
at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:168)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 8 more
Any help or clarification around this file would be appreciated.
You need 2 files in nutch/conf
:
gora.properties
: where you declare you are going to use mongodb backend.gora-mongodb-mapping.xml
(notice the dash, not the dot you wrote): where you create a mapping between names in Gora entities and the fields in the datastore.The version you are using I really think it is not prepared to work with Gora 0.5, but give it a shot. Copy gora-mongodb-mapping.xml
from Nutch-2.3-SNAPSHOT to nutch/conf/
If it does not work, try using Nutch-2.3-SNAPSHOT instead of 2.2.1.