elasticsearchcassandraelassandra

Elassandra single node cluster not starting. Stuck in Mutation Stage


I have a single node Elassandra cluster running on a box. It crashed light night. The following was the last line systemctl status output:

Main process exited, code=killed, status=6/ABRT

Upon restart however, it got stuck in mutation stage with similar lines appearing in the logs(/var/log/cassandra/system.log) repeatedly:

2020-09-14 12:53:38,048 TRACE [MutationStage-31] ElasticSecondaryIndex.java:2158 readCellValue indexer=730313363 name=value kind=REGULAR type=text value={"dw1":{"6":{"total_weight":"72664168.50","total_count":"5979710.00","product_name":"Masala 12g","target_weight":"12","mean_weight":"12.15","th_packed_weight":"71756520.00","ega":"907648.50","ega_per":"1.26"}},"dw2":{"6":{"total_weight":"72654813.00","total_count":"5979710.00","product_name":"Masala 12g","target_weight":"12","mean_weight":"12.15","th_packed_weight":"71756520.00","ega":"898293.00","ega_per":"1.25"}}}

tpstats output:

Pool Name                         Active   Pending      Completed   Blocked  All time blocked
ReadStage                              0         0              0         0                 0
MiscStage                              0         0              0         0                 0
CompactionExecutor                     0         0              9         0                 0
MutationStage                         32       920        5808076         0                 0
MemtableReclaimMemory                  0         0              8         0                 0
GossipStage                            0         0              0         0                 0
SecondaryIndexManagement               0         0              0         0                 0
RequestResponseStage                   0         0              0         0                 0
ReadRepairStage                        0         0              0         0                 0
CounterMutationStage                   0         0              0         0                 0
MigrationStage                         0         0              0         0                 0
MemtablePostFlush                      0         0              8         0                 0
PerDiskMemtableFlushWriter_0           0         0              8         0                 0
ValidationExecutor                     0         0              0         0                 0
Sampler                                0         0              0         0                 0
MemtableFlushWriter                    0         0              8         0                 0
InternalResponseStage                  0         0              0         0                 0
ViewMutationStage                      0         0              0         0                 0
AntiEntropyStage                       0         0              0         0                 0
CacheCleanupExecutor                   0         0              0         0                 0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

the pending count under MutationStage never goes to zero. Its been in this state for a long time. There are no other nodes in this cluster and no data is being written on it right now.


Solution

  • To me the symptoms you described indicate that mutations are getting replayed from the commitlog.

    You can workaround it by:

    1. Shutting C* down temporarily.
    2. Move the contents of the commitlog/ to another directory.
    3. Start Cassandra.