I have a high availability cluster with two nodes, with a resource for drbd, a virtual IP and the mariaDB files shared on the drbd partition.
Everything seems to work OK, but drbd is not syncing the latest files I have created, even though drbd status tells me they are UpToDate.
sudo drbdadm status
iba role:Primary
disk:UpToDate
Pcs also does not show errors
sudo pcs status
Cluster name: cluster_iba
Cluster Summary:
* Stack: corosync
* Current DC: iba2-ip192 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Tue Feb 22 18:16:20 2022
* Last change: Mon Feb 21 16:19:38 2022 by root via cibadmin on iba1-ip192
* 2 nodes configured
* 6 resource instances configured
Node List:
* Online: [ iba1-ip192 iba2-ip192 ]
Full List of Resources:
* virtual_ip (ocf::heartbeat:IPaddr2): Started iba2-ip192
* Clone Set: DrbdData-clone [DrbdData] (promotable):
* Masters: [ iba2-ip192 ]
* Slaves: [ iba1-ip192 ]
* DrbdFS (ocf::heartbeat:Filesystem): Started iba2-ip192
* WebServer (ocf::heartbeat:apache): Started iba2-ip192
* Maria (ocf::heartbeat:mysql): Started iba2-ip192
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
All constraint:
sudo pcs constraint list --full
Location Constraints:
Ordering Constraints:
promote DrbdData-clone then start DrbdFS (kind:Mandatory) (id:order-DrbdData-clone-DrbdFS-mandatory)
start DrbdFS then start virtual_ip (kind:Mandatory) (id:order-DrbdFS-virtual_ip-mandatory)
start virtual_ip then start WebServer (kind:Mandatory) (id:order-virtual_ip-WebServer-mandatory)
start DrbdFS then start Maria (kind:Mandatory) (id:order-DrbdFS-Maria-mandatory)
Colocation Constraints:
DrbdFS with DrbdData-clone (score:INFINITY) (with-rsc-role:Master) (id:colocation-DrbdFS-DrbdData-clone-INFINITY)
virtual_ip with DrbdFS (score:INFINITY) (id:colocation-virtual_ip-DrbdFS-INFINITY)
WebServer with virtual_ip (score:INFINITY) (id:colocation-WebServer-virtual_ip-INFINITY)
Maria with DrbdFS (score:INFINITY) (id:colocation-Maria-DrbdFS-INFINITY)
Ticket Constraints:
The files in /mnt/datosDRBD in node iba2-ip192 (when it's the master),
/mnt/datosDRBD$ ls -l
total 80
-rw-r--r-- 1 root root 5801 feb 21 12:16 drbd_cfg
-rw-r--r-- 1 root root 10494 feb 21 12:18 fs_cfg
drwx------ 2 root root 16384 feb 21 10:12 lost+found
drwxr-xr-x 4 mysql mysql 4096 feb 22 18:00 mariaDB
-rw-r--r-- 1 root root 17942 feb 21 12:39 MariaDB_cfg
-rw-r--r-- 1 root root 5 feb 21 10:13 testMParicio.txt
-rw-r--r-- 1 root root 13578 feb 21 12:21 WebServer_cfg
And the files in /mnt/datosDRBD in node iba1-ip192 (when it's the master),
ls -l
total 92
-rw-r--r-- 1 root root 5801 feb 21 12:16 drbd_cfg
drwxrwxrwx 5 www-data www-data 4096 feb 22 13:41 FilesSGITV
-rw-r--r-- 1 root root 10494 feb 21 12:18 fs_cfg
drwx------ 2 root root 16384 feb 21 10:12 lost+found
drwxr-xr-x 7 mysql mysql 4096 feb 22 17:55 mariaDB
-rw-r--r-- 1 root root 17942 feb 21 12:39 MariaDB_cfg
-rw-r--r-- 1 root root 5 feb 22 17:58 testMParicio2.txt
-rw-r--r-- 1 www-data www-data 9 feb 22 17:58 testMParicio3.txt
-rw-r--r-- 1 root root 5 feb 21 10:13 testMParicio.txt
-rw-r--r-- 1 root root 13578 feb 21 12:21 WebServer_cfg
All new files, testMParicio2.txt testMParicio3.txt and the folder FilesSGITV are missing.
I do not know what to do. I am very lost.
I appreciate any help, thanks.
(EDIT)
My config for drbd, in both nodes...
cat /etc/drbd.conf
# You can find an example in /usr/share/doc/drbd.../drbd.conf.example
include "drbd.d/global_common.conf";
include "drbd.d/*.res";
And my *.res config, in both nodes too:
resource iba {
device /dev/drbd0;
disk /dev/md3;
meta-disk internal;
on iba1 {
address 10.0.0.248:7789;
}
on iba2 {
address 10.0.0.249:7789;
}
}
drbdadm use iba1 and iba2, with IP 10.0.0.248 and 10.0.0.249
Corosync use iba1-ip192 and iba2-192, with IP 192.168.1.248 and 192.168.1.249
cat /etc/hosts
127.0.0.1 localhost
#127.0.1.1 iba1
10.0.0.248 iba1
10.0.0.249 iba2
192.168.1.248 iba1-ip192
192.168.1.249 iba2-ip192
cat /etc/drbd.d/global_common.conf
global {
usage-count yes;
udev-always-use-vnr; # treat implicit the same as explicit volumes
}
common {
handlers {
}
startup {
}
options {
}
disk {
}
net {
protocol C;
}
}
(EDIT 2)
I have found a problem in /proc/drbd
In primary node:
cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: FC3433D849E3B88C1E7B55C
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:0 dw:2284 dr:11625 al:6 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:42364728
in secondary node
cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: FC3433D849E3B88C1E7B55C
0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:36538580
Secondary node don't remember ssh key, fix with
ssh-keygen -R 10.0.0.248
ssh-copy-id iba@iba1
But drbd still with StandAlone status.
I don't know how to continue
I have found a Split-Brain that did not appear in the status of pcs.
sudo journalctl | grep Split-Brain
feb 21 13:00:10 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
feb 21 13:21:40 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
feb 21 13:27:54 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
I have stopped the cluster, with --force on the master, Then... On split-brain victim (assuming the DRBD resource is iba):
drbdadm disconnect iba
drbdadm secondary iba
drbdadm connect --discard-my-data iba
On split-brain survivor:
drbdadm primary iba
drbdadm connect iba