erlangriakleveldbriak-kv

Riak leaving node indefinitely waiting to handoff the partion to crash node


I have a 5-node Riak cluster running version 2.9.10. All nodes are on the same version.

I attempted to remove one node from the cluster to upgrade its disk. However, the node became stuck while handing off its last partition to another node in the cluster. Unfortunately, the receiving node crashed with an unrecoverable error. As a result, the Riak service on the stuck node automatically shut down. Here is the console log.

`2024-06-08 07:35:26.278 [error] <0.801.0>@riak_kv_vnode:init:856 Failed to start riak_kv_eleveldb_backend backend for index 959110449498405040071168171470060731649205731328 error: {db_open,"Corruption: truncated record at end of file"}
2024-06-08 07:35:26.283 [notice] <0.801.0>@riak:stop:43 "backend module failed to start."
2024-06-08 07:35:26.283 [error] <0.801.0> gen_fsm <0.801.0> in state started terminated with reason: no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{db_open,"Corruption: truncated record at end of file"}}}, undefined) line 2380
2024-06-08 07:35:26.283 [error] <0.801.0> CRASH REPORT Process <0.801.0> with 1 neighbours exited with reason: no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{db_open,"Corruption: truncated record at end of file"}}}, undefined) line 2380 in gen_fsm:terminate/7 line 600
2024-06-08 07:35:26.283 [error] <0.170.0> Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.801.0> exit with reason no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{db_open,"Corruption: truncated record at end of file"}}}, undefined) line 2380 in context child_terminated
2024-06-08 07:35:26.284 [error] <0.167.0> Supervisor riak_core_sup had child riak_core_vnode_manager started with riak_core_vnode_manager:start_link() at <0.215.0> exit with reason {{function_clause,[{riak_kv_vnode,terminate,[{bad_return_value,{stop,{db_open,"Corruption: truncated record at end of file"}}},undefined],[{file,"src/riak_kv_vnode.erl"},{line,2380}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,941}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]},{gen_fsm,sync_send_event,[<0.801.0>,wait_for_init,infinity]}} in context child_terminated
2024-06-08 07:35:26.285 [info] <0.385.0>@riak_kv_app:prep_stop:267 Stopping application riak_kv - marked service down.
`

This the log file of leveldb partion.

`2024/06/08-07:35:26.275337 7f98e97e2640                        Version: 2.0.36 (enterprise edition)
2024/06/08-07:35:26.275362 7f98e97e2640             Options.comparator: leveldb.InternalKeyComparator
2024/06/08-07:35:26.275365 7f98e97e2640      Options.create_if_missing: 1
2024/06/08-07:35:26.275369 7f98e97e2640        Options.error_if_exists: 0
2024/06/08-07:35:26.275372 7f98e97e2640        Options.paranoid_checks: 0
2024/06/08-07:35:26.275375 7f98e97e2640     Options.verify_compactions: 1
2024/06/08-07:35:26.275378 7f98e97e2640                    Options.env: 0x7f99700049d0
2024/06/08-07:35:26.275382 7f98e97e2640               Options.info_log: 0x7f991c003260
2024/06/08-07:35:26.275385 7f98e97e2640      Options.write_buffer_size: 43049366
2024/06/08-07:35:26.275388 7f98e97e2640         Options.max_open_files: 1000
2024/06/08-07:35:26.275392 7f98e97e2640            Options.block_cache: 0x7f991c002070
2024/06/08-07:35:26.275395 7f98e97e2640             Options.block_size: 4096
2024/06/08-07:35:26.275398 7f98e97e2640       Options.block_size_steps: 16
2024/06/08-07:35:26.275401 7f98e97e2640 Options.block_restart_interval: 16
2024/06/08-07:35:26.275405 7f98e97e2640            Options.compression: 2
2024/06/08-07:35:26.275408 7f98e97e2640          Options.filter_policy: leveldb.BuiltinBloomFilter2
2024/06/08-07:35:26.275411 7f98e97e2640              Options.is_repair: false
2024/06/08-07:35:26.275414 7f98e97e2640         Options.is_internal_db: false
2024/06/08-07:35:26.275417 7f98e97e2640      Options.total_leveldb_mem: 11518763827
2024/06/08-07:35:26.275421 7f98e97e2640  Options.block_cache_threshold: 33554432
2024/06/08-07:35:26.275424 7f98e97e2640  Options.limited_developer_mem: false
2024/06/08-07:35:26.275427 7f98e97e2640              Options.mmap_size: 0
2024/06/08-07:35:26.275430 7f98e97e2640       Options.delete_threshold: 1000
2024/06/08-07:35:26.275434 7f98e97e2640       Options.fadvise_willneed: false
2024/06/08-07:35:26.275437 7f98e97e2640      Options.tiered_slow_level: 0
2024/06/08-07:35:26.275440 7f98e97e2640     Options.tiered_fast_prefix: /var/lib/riak/leveldb/959110449498405040071168171470060731649205731328
2024/06/08-07:35:26.275443 7f98e97e2640     Options.tiered_slow_prefix: /var/lib/riak/leveldb/959110449498405040071168171470060731649205731328
2024/06/08-07:35:26.275446 7f98e97e2640                         crc32c: hardware
2024/06/08-07:35:26.275449 7f98e97e2640   Options.cache_object_warming: true
2024/06/08-07:35:26.275453 7f98e97e2640        Options.ExpiryActivated: false
2024/06/08-07:35:26.275456 7f98e97e2640  ExpiryModuleEE.expiry_enabled: false
2024/06/08-07:35:26.275459 7f98e97e2640  ExpiryModuleEE.expiry_minutes: 0
2024/06/08-07:35:26.275462 7f98e97e2640     ExpiryModuleEE.whole_files: true
2024/06/08-07:35:26.275465 7f98e97e2640                File cache size: 1418935620
2024/06/08-07:35:26.275468 7f98e97e2640               Block cache size: 1421032772
2024/06/08-07:35:26.278268 7f98e97e2640 File cache warmed with 0 files.
2024/06/08-07:35:26.278306 7f98e97e2640 Wrote 0 file cache objects for warming.
`

If recovering the stuck node proves too difficult, I'd be open to marking it down and replacing it with a new empty node. However, my concern is data recovery on the remaining nodes. Can you advise on the best approach to ensure the new node receives the lost data through Riak's data replication mechanisms?

I have tried to repaired the partition but no luck https://www.tiot.jp/riak-docs/riak/kv/2.9.10/using/repair-recovery/repairs/


Solution

  • From the looks of that, it seems that your partition is corrupted. Although you said that you tried the https://www.tiot.jp/riak-docs/riak/kv/2.9.10/using/repair-recovery/repairs/ page, did you try the section dedicated to LevelDB corruption (https://www.tiot.jp/riak-docs/riak/kv/2.9.10/using/repair-recovery/repairs/#leveldb)?

    If you did and it didn't work, the next step would be to take advantage of Riak's fault tollerance. By default, Riak has an n_val of 3 i.e. it stores 3 copies of all data you put in to it. As your partition appears to be corrupted, that means that you should still have two good copies of your data.

    What we can do is:

    1. Stop Riak on the target node via riak stop or systemctl stop riak
    2. Just in case, back up the damaged partition to somewhere else before we proceed e.g. cp -r /var/lib/riak/leveldb/959110449498405040071168171470060731649205731328 /path/to/a/backup/location/outside/of/riak
    3. Delete the contents of the damaged partition e.g. rm -rf /var/lib/riak/leveldb/959110449498405040071168171470060731649205731328/*
    4. Start Riak on the target node riak start or systemctl start riak and then wait for Riak to come up fully (riak-admin wait-for-service riak_kv)
    5. Perform the partition repair steps listed at https://www.tiot.jp/riak-docs/riak/kv/2.9.10/using/repair-recovery/repairs/#repairing-a-single-partition and substitute in the name of your partition (riak_kv_vnode:repair(959110449498405040071168171470060731649205731328). from a riak attach session)
    6. Monitor the repair process with riak-admin transfers

    Hopefully this should fix your broken partition.

    For future reference, you do not need to remove nodes from the ring for basic maintenance such as hardware upgrades. You can instead just stop the node and then, from another node in the cluster, note the first node as down with riak-admin down <nodename> e.g. riak-admin down riak3@192.168.10.3.

    When you finish the hardware maintenance, simply restart Riak and provided that the node's IP address has not changed, it should be able to continue as if nothing had ever happened.