I have successfully crawled a website using Nutch and now I want to create a warc from the results. However, running both the warc and commoncrawldump commands fail. Also, running bin/nutch dump -segement ....
works successfully on the same segment folder.
I am using nutch v-1.17 and running:
bin/nutch commoncrawldump -outputDir output/ -segment crawl/segments
The error from hadoop.log is ERROR tools.CommonCrawlDataDumper - No segment directories found in my/path/
despite having just ran a crawl there.
Inside the segments folder were segments from a previous crawl that were throwing up the error. They did not contain all the segment data as I believe the crawl was cancelled/finished early. This caused the entire process to fail. Deleting all those files and starting anew fixed the issue.