shellyamlgriddb

Issues with file system level backup and recovery operations on GridDB Node


I am trying to restore backup data to a GridDB node but kept encountering the status NG displayed for one of the backup files which happens to be a file I direly need.

To troubleshoot the issue, I took the following steps:

I ensured that no node had been started and confirmed that the cluster definition file matched the other nodes in the cluster that the joining node is part of. I also checked the backup name used in the recovery and verified the backup status to select one that has been correctly backed up. I ensured that there were no leftover data files, checkpoint log files, or transaction log files in the database file directories (/var/lib/gridstore/data and /var/lib/gridstore/txnlog by default) of the node. I executed the restore command on the machine starting the node and then started the node. Next, I used the command below to check the backup data:

gs_backuplist -u admin/admin

Here is the list of backup displayed

BackupName   Status  StartTime                 EndTime
-------------------------------------------------------------------------
*201912           --  2019-12-01T05:20:00+09:00 2019-12-01T06:10:55+09:00
*201911           --  2019-11-01T05:20:00+09:00 2019-11-01T06:10:55+09:00
  :
 20191025NO2      OK  2019-10-25T06:37:10+09:00 2019-10-25T06:38:20+09:00
 20191025         NG  2019-10-25T02:13:34+09:00 -
 20191018         OK  2019-10-18T02:10:00+09:00 2019-10-18T02:12:15+09:00

gs_backuplist -u admin/admin 201912

BackupName : 201912

BackupData            Status StartTime                 EndTime
--------------------------------------------------------------------------------
201912_lv0                OK 2019-12-01T05:20:00+09:00 2019-12-01T06:10:55+09:00
201912_lv1_000_001        OK 2019-12-02T05:20:00+09:00 2019-12-02T05:20:52+09:00
201912_lv1_000_002        OK 2019-12-03T05:20:00+09:00 2019-12-03T05:20:25+09:00
201912_lv1_000_003        OK 2019-12-04T05:20:00+09:00 2019-12-04T05:20:33+09:00
201912_lv1_000_004        OK 2019-12-05T05:20:00+09:00 2019-12-05T05:21:25+09:00
201912_lv1_000_005        OK 2019-12-06T05:20:00+09:00 2019-12-06T05:21:05+09:00
201912_lv1_001_000        OK 2019-12-07T05:20:00+09:00 2019-12-07T05:22:11+09:00
201912_lv1_001_001        OK 2019-12-08T05:20:00+09:00 2019-12-08T05:20:55+09:00

However, I kept encountering the status NG displayed for the 20191025 backup file, and I really need this file restored. Kindly advise on how I can go around this issue.

I'm currently stuck and need assistance restoring this backup data to my GridDB node. Any help and guidance would be greatly appreciated.


Solution

  • If the status displayed is NG, the backup file may be damaged and so restoration is not possible.

    Check the data among the 201912 backup data used in the recovery. Differential/incremental backup data used for recovery can be checked in the --test option of gs_restore. In the --test option, only data used for recovery is displayed and restoration of data will not be carried out. Use this in the preliminary checks.

    When a specific partition fails, there is a need to check where the latest data of the partition is being maintained.

    Use the gs_backuplist command on all the nodes constituting the cluster, and specify the ID of the partition for which you wish to check the --partitionId option for execution. Use the node backup that contains the largest LSN number for recovery. Use the node backup that contains the largest LSN number for recovery.