I have a 5-node Riak cluster running version 2.9.10. All nodes are on the same version.
I attempted to remove one node from the cluster to upgrade its disk. However, the node became stuck while handing off its last partition to another node in the cluster. Unfortunately, the receiving node crashed with an unrecoverable error. As a result, the Riak service on the stuck node automatically shut down. Here is the console log.
`2024-06-08 07:35:26.278 [error] <0.801.0>@riak_kv_vnode:init:856 Failed to start riak_kv_eleveldb_backend backend for index 959110449498405040071168171470060731649205731328 error: {db_open,"Corruption: truncated record at end of file"}
2024-06-08 07:35:26.283 [notice] <0.801.0>@riak:stop:43 "backend module failed to start."
2024-06-08 07:35:26.283 [error] <0.801.0> gen_fsm <0.801.0> in state started terminated with reason: no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{db_open,"Corruption: truncated record at end of file"}}}, undefined) line 2380
2024-06-08 07:35:26.283 [error] <0.801.0> CRASH REPORT Process <0.801.0> with 1 neighbours exited with reason: no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{db_open,"Corruption: truncated record at end of file"}}}, undefined) line 2380 in gen_fsm:terminate/7 line 600
2024-06-08 07:35:26.283 [error] <0.170.0> Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.801.0> exit with reason no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{db_open,"Corruption: truncated record at end of file"}}}, undefined) line 2380 in context child_terminated
2024-06-08 07:35:26.284 [error] <0.167.0> Supervisor riak_core_sup had child riak_core_vnode_manager started with riak_core_vnode_manager:start_link() at <0.215.0> exit with reason {{function_clause,[{riak_kv_vnode,terminate,[{bad_return_value,{stop,{db_open,"Corruption: truncated record at end of file"}}},undefined],[{file,"src/riak_kv_vnode.erl"},{line,2380}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,941}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]},{gen_fsm,sync_send_event,[<0.801.0>,wait_for_init,infinity]}} in context child_terminated
2024-06-08 07:35:26.285 [info] <0.385.0>@riak_kv_app:prep_stop:267 Stopping application riak_kv - marked service down.
`
This the log file of leveldb partion.
`2024/06/08-07:35:26.275337 7f98e97e2640 Version: 2.0.36 (enterprise edition)
2024/06/08-07:35:26.275362 7f98e97e2640 Options.comparator: leveldb.InternalKeyComparator
2024/06/08-07:35:26.275365 7f98e97e2640 Options.create_if_missing: 1
2024/06/08-07:35:26.275369 7f98e97e2640 Options.error_if_exists: 0
2024/06/08-07:35:26.275372 7f98e97e2640 Options.paranoid_checks: 0
2024/06/08-07:35:26.275375 7f98e97e2640 Options.verify_compactions: 1
2024/06/08-07:35:26.275378 7f98e97e2640 Options.env: 0x7f99700049d0
2024/06/08-07:35:26.275382 7f98e97e2640 Options.info_log: 0x7f991c003260
2024/06/08-07:35:26.275385 7f98e97e2640 Options.write_buffer_size: 43049366
2024/06/08-07:35:26.275388 7f98e97e2640 Options.max_open_files: 1000
2024/06/08-07:35:26.275392 7f98e97e2640 Options.block_cache: 0x7f991c002070
2024/06/08-07:35:26.275395 7f98e97e2640 Options.block_size: 4096
2024/06/08-07:35:26.275398 7f98e97e2640 Options.block_size_steps: 16
2024/06/08-07:35:26.275401 7f98e97e2640 Options.block_restart_interval: 16
2024/06/08-07:35:26.275405 7f98e97e2640 Options.compression: 2
2024/06/08-07:35:26.275408 7f98e97e2640 Options.filter_policy: leveldb.BuiltinBloomFilter2
2024/06/08-07:35:26.275411 7f98e97e2640 Options.is_repair: false
2024/06/08-07:35:26.275414 7f98e97e2640 Options.is_internal_db: false
2024/06/08-07:35:26.275417 7f98e97e2640 Options.total_leveldb_mem: 11518763827
2024/06/08-07:35:26.275421 7f98e97e2640 Options.block_cache_threshold: 33554432
2024/06/08-07:35:26.275424 7f98e97e2640 Options.limited_developer_mem: false
2024/06/08-07:35:26.275427 7f98e97e2640 Options.mmap_size: 0
2024/06/08-07:35:26.275430 7f98e97e2640 Options.delete_threshold: 1000
2024/06/08-07:35:26.275434 7f98e97e2640 Options.fadvise_willneed: false
2024/06/08-07:35:26.275437 7f98e97e2640 Options.tiered_slow_level: 0
2024/06/08-07:35:26.275440 7f98e97e2640 Options.tiered_fast_prefix: /var/lib/riak/leveldb/959110449498405040071168171470060731649205731328
2024/06/08-07:35:26.275443 7f98e97e2640 Options.tiered_slow_prefix: /var/lib/riak/leveldb/959110449498405040071168171470060731649205731328
2024/06/08-07:35:26.275446 7f98e97e2640 crc32c: hardware
2024/06/08-07:35:26.275449 7f98e97e2640 Options.cache_object_warming: true
2024/06/08-07:35:26.275453 7f98e97e2640 Options.ExpiryActivated: false
2024/06/08-07:35:26.275456 7f98e97e2640 ExpiryModuleEE.expiry_enabled: false
2024/06/08-07:35:26.275459 7f98e97e2640 ExpiryModuleEE.expiry_minutes: 0
2024/06/08-07:35:26.275462 7f98e97e2640 ExpiryModuleEE.whole_files: true
2024/06/08-07:35:26.275465 7f98e97e2640 File cache size: 1418935620
2024/06/08-07:35:26.275468 7f98e97e2640 Block cache size: 1421032772
2024/06/08-07:35:26.278268 7f98e97e2640 File cache warmed with 0 files.
2024/06/08-07:35:26.278306 7f98e97e2640 Wrote 0 file cache objects for warming.
`
If recovering the stuck node proves too difficult, I'd be open to marking it down and replacing it with a new empty node. However, my concern is data recovery on the remaining nodes. Can you advise on the best approach to ensure the new node receives the lost data through Riak's data replication mechanisms?
I have tried to repaired the partition but no luck https://www.tiot.jp/riak-docs/riak/kv/2.9.10/using/repair-recovery/repairs/
From the looks of that, it seems that your partition is corrupted. Although you said that you tried the https://www.tiot.jp/riak-docs/riak/kv/2.9.10/using/repair-recovery/repairs/ page, did you try the section dedicated to LevelDB corruption (https://www.tiot.jp/riak-docs/riak/kv/2.9.10/using/repair-recovery/repairs/#leveldb)?
If you did and it didn't work, the next step would be to take advantage of Riak's fault tollerance. By default, Riak has an n_val of 3 i.e. it stores 3 copies of all data you put in to it. As your partition appears to be corrupted, that means that you should still have two good copies of your data.
What we can do is:
riak stop
or systemctl stop riak
cp -r /var/lib/riak/leveldb/959110449498405040071168171470060731649205731328 /path/to/a/backup/location/outside/of/riak
rm -rf /var/lib/riak/leveldb/959110449498405040071168171470060731649205731328/*
riak start
or systemctl start riak
and then wait for Riak to come up fully (riak-admin wait-for-service riak_kv
)riak_kv_vnode:repair(959110449498405040071168171470060731649205731328).
from a riak attach
session)riak-admin transfers
Hopefully this should fix your broken partition.
For future reference, you do not need to remove nodes from the ring for basic maintenance such as hardware upgrades. You can instead just stop the node and then, from another node in the cluster, note the first node as down with riak-admin down <nodename>
e.g. riak-admin down riak3@192.168.10.3
.
When you finish the hardware maintenance, simply restart Riak and provided that the node's IP address has not changed, it should be able to continue as if nothing had ever happened.