My indexer, using Lucene, seems to crash during indexing operations after writing an index file approximately 16GB in size.
The stack trace written to the console is repeated three times for reasons I don't know. For brevity I've only supplied the single part that's repeated. Here's the stack trace as written to the conolse by Lucene:
Lucene.Net.Index.MergePolicy+MergeException: Exception of type 'Lucene.Net.Index.MergePolicy+MergeException' was thrown. --->
System.IO.FileNotFoundException: Could not find file 'PATH_TO_MY_INDEX_DIRECTORY\_xx.cfs'.
File name: 'PATH_TO_MY_INDEX_DIRECTORY\_xx.cfs'
at Lucene.Net.Index.IndexWriter.HandleMergeException(Exception t, OneMerge merge)
at Lucene.Net.Index.IndexWriter.Merge(OneMerge merge)
at Lucene.Net.Index.ConcurrentMergeScheduler.MergeThread.Run()
--- End of inner exception stack trace ---
at Lucene.Net.Index.ConcurrentMergeScheduler.HandleMergeException(Exception exc)
at Lucene.Net.Index.ConcurrentMergeScheduler.MergeThread.Run()
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
When I open the generated log with the Java edition of Luke the index is deleted (presumably because it's corrupted, the "write.lock" file remains, for example), though this could be a bug or misconfiguration of Luke.
Creating this index takes approximately 36 hours and I'm not keen on having to do it again for the third time (this isn't the first time it's happened).
I have no idea what's causing this. What can I do?
I'm using Lucene.net 2.9.2 because it's the last version that was built for .NET 3.5.
I realised that this was caused by writing too much to the index without calling Commit
. I modiifed my code to call Commit
after writing about 10MB of data. I haven't had the exception since - and when it does crash it means I don't need to rebuild the entire 36GB index, just the last 10MB.