Cassandra repair is failing to run with the below error on node 1. I earlier started multiple repair sessions in parallel by mistake. I find that there is a bug https://issues.apache.org/jira/browse/CASSANDRA-11824 which has been resolved for the same scenario. But I am already using cassandra 3.9 Please confirm if running nodetool scrub is the only workaround? Are there any considerations that we need to keep in mind before running scrub as I need to run this directly on Prod.
com.google.common.util.concurrent.UncheckedExecutionException: org.apache.cassandra.exceptions.RepairException: [repair #6546ce10-3a70-11ec-9336-394ae1cd743d on test/test_config, [(-1879129450237588992,-1867793788349541955], (-1228457230064908637,-1228389616821781301], (583169750278890460,583583127041100026]]] Validation failed in /10.11.22.123
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1525) ~[guava-18.0.jar:na]
On node 2(10.11.22.123),
ERROR 17:33:12 Cannot start multiple repair sessions over the same sstables
ERROR 17:33:12 Failed creating a merkle tree for [repair #6546ce10-3a70-11ec-9336-394ae1cd743d on test/test_config, [(-1879129450237588992,-1867793788349541955], (-1228457230064908637,-1228389616821781301], (583169750278890460,583583127041100026]]], /10.11.22.789(node 1) (see log for details)
ERROR 17:33:12 Exception in thread Thread[ValidationExecutor:10,1,main]
java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables
at org.apache.cassandra.service.ActiveRepairService$ParentRepairSession.markSSTablesRepairing(ActiveRepairService.java:526) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1318) ~[apache-cassandra-3.9.jar:3.9]
Nodetool tpstats revealed that there were indeed active repair jobs, but they were actually not running or compactionstats did not show any running jobs. So I restarted just the nodes on which the repair was stuck and this cleared up those stuck repair jobs and I was able to run a fresh repair after that.
nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 323161614 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 339671804 0 0
RequestResponseStage 0 0 440712393 0 0
ReadRepairStage 0 0 13751257 0 0
CounterMutationStage 0 0 0 0 0
Repair#3 1 3525 3 0 0
.....