mongodbwiredtiger

Mongodb 4.4. replica set members stuck on startup (WT_VERB_RECOVERY step)


After a replica member restart (possibly unclean), the server startup process is stuck without any indication of CPU or disk IO usage.

The log outputs are the following:


{"t":{"$date":"2021-09-20T18:15:43.621+00:00"},"s":"I", "c":"CONTROL",  "id":20698,   "ctx":"main","msg":"***** SERVER RESTARTED *****"}
{"t":{"$date":"2021-09-20T18:15:43.623+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2021-09-20T18:15:43.634+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-09-20T18:15:43.634+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2021-09-20T18:15:43.634+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-09-20T18:15:43.703+00:00"},"s":"I",  "c":"STORAGE",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":15724,"port":27017,"dbPath":"/mnt/mongodb_volume/mongodb_db","architecture":"64-bit","host":"mongodb-his
tory-nyc3-r1.supplyspy.com"}}
{"t":{"$date":"2021-09-20T18:15:43.703+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.8","gitVersion":"83b8bb8b6b325d8d8d3dfd2ad9f744bdad7d6ca0","openSSLVersion":"OpenSSL 1.1
.1  11 Sep 2018","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu1804","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2021-09-20T18:15:43.703+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"18.04"}}}
{"t":{"$date":"2021-09-20T18:15:43.703+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/mnt/mongodb_volume/mongodb_db/automation-mongod.conf","net":{"bindIp":"0.0.0.
0","port":27017},"operationProfiling":{"mode":"slowOp","slowOpThresholdMs":4000},"processManagement":{"fork":true,"timeZoneInfo":"/usr/share/zoneinfo"},"replication":{"enableMajorityReadConcern":false,"replSetName":"history-main"},"security":{"authoriz
ation":"enabled","keyFile":"/var/lib/mongodb-mms-automation/keyfile"},"setParameter":{"authenticationMechanisms":"SCRAM-SHA-1"},"storage":{"dbPath":"/mnt/mongodb_volume/mongodb_db","engine":"wiredTiger","journal":{"commitIntervalMs":200,"enabled":true}
},"systemLog":{"destination":"file","logAppend":true,"path":"/var/log/mongodb/mongod.log"}}}}
{"t":{"$date":"2021-09-20T18:15:43.704+00:00"},"s":"W",  "c":"STORAGE",  "id":22271,   "ctx":"initandlisten","msg":"Detected unclean shutdown - Lock file is not empty","attr":{"lockFile":"/mnt/mongodb_volume/mongodb_db/mongod.lock"}}
{"t":{"$date":"2021-09-20T18:15:43.704+00:00"},"s":"W",  "c":"STORAGE",  "id":22302,   "ctx":"initandlisten","msg":"Recovering data from the last clean checkpoint."}
{"t":{"$date":"2021-09-20T18:15:43.704+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=3476M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false
,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progr
ess],"}}
{"t":{"$date":"2021-09-20T18:15:45.519+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161745:519173][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering lo
g 167037 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:45.892+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161745:892803][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering lo
g 167038 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.051+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:51008][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log
 167039 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.141+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:141837][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROG
RESS] Main recovery loop: starting at 167037/84361856 to 167039/256"}}
{"t":{"$date":"2021-09-20T18:15:46.142+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:142283][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering lo
g 167037 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.434+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:434289][15724:0x7ff77c0e9ac0], file:sizeStorer.wt, txn-recover: [WT_VERB_RECOVERY_PRO
GRESS] Recovering log 167038 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.745+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:745696][15724:0x7ff77c0e9ac0], file:sizeStorer.wt, txn-recover: [WT_VERB_RECOVERY_PRO
GRESS] Recovering log 167039 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.800+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:800525][15724:0x7ff77c0e9ac0], file:sizeStorer.wt, txn-recover: [WT_VERB_RECOVERY | W
T_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (1632161384, 162)"}}
{"t":{"$date":"2021-09-20T18:15:46.800+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:800622][15724:0x7ff77c0e9ac0], file:sizeStorer.wt, txn-recover: [WT_VERB_RECOVERY | W
T_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (1632161379, 162)"}}

I'm not receiving any other output for more than 2 hours already.

Data size on disk is ~700GB.


Edit: After long wait time I got more output:

{"t":{"$date":"2021-09-20T22:23:55.310+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:306696][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 66 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.320+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:320968][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 67 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.354+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:354841][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 68 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.355+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:355064][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 69 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.359+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:359206][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 70 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.362+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:362785][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 71 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.370+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:370297][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 72 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.374+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:374281][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 73 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.382+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:382329][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 74 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.560+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:560668][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 75 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.563+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:563027][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 76 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.563+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:563135][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 77 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.563+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:563211][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 78 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.611+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:611968][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 79 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.052+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:52779][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 80 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.059+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:59610][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 81 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.071+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:71578][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 82 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.078+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:78379][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 83 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.082+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:82011][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 84 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.084+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:84104][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 85 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.086+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:86957][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 86 files. For more detailed logging, enable WT_VERB_RTS"}}



Solution

  • "Detected unclean shutdown" means the server was not stopped gracefully. Depending on the nature of the shutdown, there may be inconsistency or corruption present in the data files.

    So now it is checking the data files and rolling them back to a self-consistent state.

    The "has been running for" messages show it is making progress. Let it finish and it will hopefully be able to return the data to a usable state.