We have a Xtradb cluster with three nodes. There is one node, which was not properly stopped and won't start. The other two nodes are correctly working and responding. The only thing in logs is this:
-- Unit mysql.service has begun starting up.
Aug 25 04:40:45 percona-prod-perconaxtradb-vm-0 /etc/init.d/mysql[2503]: MySQL PID not found, pid_file detected/guessed: /var/run/mysql
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 mysql[2462]: Starting MySQL (Percona XtraDB Cluster) database server: mysqld . . . . .
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 mysql[2462]: failed!
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 systemd[1]: mysql.service: control process exited, code=exited status=1
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daem
-- Subject: Unit mysql.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
In /var/lib/mysql/wsrep_recovery.qEEkjd
we found this:
2018-08-25T05:49:31.055887Z 0 [ERROR] Found 20 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2018-08-25T05:49:31.055892Z 0 [ERROR] Aborting
2018-08-25T05:49:31.055901Z 0 [Note] Binlog end
We would like to completely drop these 20 prepared transactions
.
The other two nodes are consistent and working, so it would be enough to tell this node "ignore your state and sync with other nodes".
In the end we removed the /data
folder on the dead node and restarted the node. The node then started SST
replication - which takes a long time and the only progress one can see is checking the growing size of the folder. But then it worked.