I'm having a very weird issue here, I have set-up my solr architecture as I supposed here and everything works fine except for that my Solr slave machine gets corrupt. The (test) set-up is as follows:
Solr Setup
The process
The whole proces goes like this:
Let's assume that the live core had 5000 documents and the reindex core 20000 after a complete rebuild.
The (wrong) result
After the swap this is the result:
Master: Live (20000), Reindex (5000) Slave: Live (20000), Reindex (20000)
It looks like the slave copies his reindex core to the live core but doesn't exchange the live core with the reindex core like the master does. The problem is I guess that the slave thinks it's newer then the master and the result is replication is dead on the reindex core.
Here are some log messages EXACTLY after a SWAP is issued:
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1328526603592, generation: 332
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1328526603596, generation: 336
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1328526603596, generation: 336
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1328526603592, generation: 332
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Number of files in latest index in master: 1
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Number of files in latest index in master: 11
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.tvx
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.fdt
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.frq
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/segments_9c
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.fdx
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.prx
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.tii
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.tvd
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.tis
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller$FileFetcher fetchPackets
WARNING: No content recieved for file: {name=segments_98, lastmodified=0, size=0}
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.tvf
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for /usr/share/solr/data/index/_ct.fnm
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 0 secs
13-mrt-2012 14:29:46 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 0 secs
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller modifyIndexProps
INFO: New index installed. Updating index properties...
13-mrt-2012 14:29:46 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
13-mrt-2012 14:29:46 org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@598d00 main
13-mrt-2012 14:29:46 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@598d00 main from Searcher@189c12a main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
13-mrt-2012 14:29:46 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@598d00 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
13-mrt-2012 14:29:46 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@598d00 main from Searcher@189c12a main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=34,warmupTime=1,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
13-mrt-2012 14:29:46 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
13-mrt-2012 14:29:46 org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed
org.apache.solr.common.SolrException: Index fetch failed :
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.RuntimeException: java.io.IOException: read past EOF
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1069)
at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:467)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
... 10 more
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:36)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:68)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:243)
at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:76)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:73)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1058)
... 13 more
13-mrt-2012 14:29:46 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@598d00 main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=34,warmupTime=10,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
13-mrt-2012 14:29:46 org.apache.solr.core.SolrCore registerSearcher
INFO: [live] Registered new searcher Searcher@598d00 main
13-mrt-2012 14:29:46 org.apache.solr.search.SolrIndexSearcher close
INFO: Closing Searcher@189c12a main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=34,warmupTime=1,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
13-mrt-2012 14:29:46 org.apache.solr.handler.SnapPuller doCommit
INFO: Force open index writer to make sure older index files get deleted
13-mrt-2012 14:29:46 org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed
org.apache.solr.common.SolrException: Index fetch failed :
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:36)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:68)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:243)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:309)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:517)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:306)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1596)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1418)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:191)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
... 10 more
Until step 6 everything is okay if I do everything manually. Everything collaps after the core swap.
I hope that someone could help me or telling me what I am doing wrong.. Should replication be disabled when you do a core swap? Any tips?
I managed to solve this issue, the problem was that both of my cores were using the same dataDir. By setting the dataDir in the solr.xml file to both different directories solved the problem!