I'm running a 16 nodes cluster (Riak 2.1.7
)
I have 2 partitions which are stuck in the "waiting to handoff" state for a long time.
The 2 partitions in question types are secondary instead of being primary
riak-admin cluster partitions
All nodes seem to be up, but I'm still having the following errors
[error] <0.13307.2430>@riak_core_handoff_sender:start_fold:282 hinted transfer of riak_kv_vnode from 'riak@g37.xxx.com' 954828706420287160427993313561712246240057491456 to 'riak@g39.xxx.com' 954828706420287160427993313561712246240057491456 failed because of error:{badmatch,{error,closed}} [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,132}]}]
Any help please?
I've tried to repair the partition and restart the Node.
What happened before the partitions needed handoff? Do you have nodes leaving or joining the ring?
You might want to double check your Riak KV version as 2.1.7 does not exist. However, 2.0.7 and 2.9.7 do. If you execute riak version
from the command line, it should tell you.
It sounds like you have a stuck set of transfers. The usual fix for this is to stop all transfers with riak-admin transfer-limit 0
, wait for a few minutes and then set the transfer limit back to whatever it was before e.g. to return to the default setting of 2, use riak-admin transfer-limit 2
.
Assuming you have an n-val of 3 or higher (default setting is 3), you could potentially delete the content of the partitions on one node from disk and then repair them. That should clear anything that should be blocking them. As you say they are not primary partitions, there should be no risk of data loss.