I have a single node Elassandra cluster running on a box. It crashed light night. The following was the last line systemctl status output:
Main process exited, code=killed, status=6/ABRT
Upon restart however, it got stuck in mutation stage with similar lines appearing in the logs(/var/log/cassandra/system.log) repeatedly:
2020-09-14 12:53:38,048 TRACE [MutationStage-31] ElasticSecondaryIndex.java:2158 readCellValue indexer=730313363 name=value kind=REGULAR type=text value={"dw1":{"6":{"total_weight":"72664168.50","total_count":"5979710.00","product_name":"Masala 12g","target_weight":"12","mean_weight":"12.15","th_packed_weight":"71756520.00","ega":"907648.50","ega_per":"1.26"}},"dw2":{"6":{"total_weight":"72654813.00","total_count":"5979710.00","product_name":"Masala 12g","target_weight":"12","mean_weight":"12.15","th_packed_weight":"71756520.00","ega":"898293.00","ega_per":"1.25"}}}
tpstats output:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 9 0 0
MutationStage 32 920 5808076 0 0
MemtableReclaimMemory 0 0 8 0 0
GossipStage 0 0 0 0 0
SecondaryIndexManagement 0 0 0 0 0
RequestResponseStage 0 0 0 0 0
ReadRepairStage 0 0 0 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemtablePostFlush 0 0 8 0 0
PerDiskMemtableFlushWriter_0 0 0 8 0 0
ValidationExecutor 0 0 0 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 8 0 0
InternalResponseStage 0 0 0 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
HINT 0
MUTATION 0
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
the pending count under MutationStage never goes to zero. Its been in this state for a long time. There are no other nodes in this cluster and no data is being written on it right now.
To me the symptoms you described indicate that mutations are getting replayed from the commitlog
.
You can workaround it by:
commitlog/
to another directory.