I use Monit 5.27.2 and MariaDB 10.5.15 on my Debian 11 servers.
My servers are set up in Multi-Source MariaDB Replication.
Sometimes, replication stops and I am not notified. I have not seen anywhere a prepared configuration for Monit to send such alerts.
The errors are visible with the following query:
SHOW SLAVE STATUS;
With those values particularly interesting:
I have not found any way to use Monit to monitor directly the result of a SQL statement. However, it seems possible to monitor the return code of a command with versions before Monit 5.29.0 (and the output of a command after).
I can probably prepare a Bash/Shell script that will check the output of the SQL statement and then a Monit config that will send me an alert in case of issue. If I do, I will post this as an answer here. I was just wondering whether there was already a known solution to this.
I expect this to work with MySQL.
I intend to test this with those SQL queries:
STOP SLAVE;
START SLAVE;
I got this to work as I expected. I did this on all servers sources of the replication.
Here is the file monit_replication.sh
:
#!/bin/bash
user="YOUR_USER"
password="YOUR_PASSWORD"
result=`echo "show slave status"|mysql -EsB --user="$user" --password="$password"`
contains_with_yes(){
if echo "$1"|grep -i "$2"|grep -i "Yes" > /dev/null; then
echo "$2 ok"
else
echo "$2 not ok"
exit 1
fi
}
contains_with_yes "$result" "Slave_IO_Running"
contains_with_yes "$result" "Slave_SQL_Running"
exit 0
Made this executable:
chmod 700 /YOUR_PATH_TO_THE_FILE/monit_replication.sh
Added to /etc/monit/conf-enabled/mysql
:
check program MySQL_replication with path "/YOUR_PATH_TO_THE_FILE/monit_replication.sh"
every 120 cycles
if status > 0 then alert
group mysql
Ran this SQL:
STOP SLAVE;
Restarted monit:
/etc/init.d/monit restart
Found in /var/log/monit.log
:
[2022-07-04T10:46:55+0000] error : 'MySQL_replication' status failed (1) -- Slave_IO_Running not ok
Received this email:
monit alert -- Status failed MySQL_replication
Status failed Service MySQL_replication
Date: Mon, 04 Jul 2022 10:46:55
Action: alert
Host: YOUR_HOST
Description: status failed (1) -- Slave_IO_Running not ok
Your faithful employee,
Monit
This validates the test. Restoring replication with this SQL query:
START SLAVE;
If any simpler solution based on Monit can alert me of a replication failure on MariaDB, I would be happy to accept it as the solution to the question.