cluster-computinghigh-availabilitypacemakerdrbdcorosync

drbd & Corosync - My drbd works, it shows me that it is upToDate, but it is not


I have a high availability cluster with two nodes, with a resource for drbd, a virtual IP and the mariaDB files shared on the drbd partition.

Everything seems to work OK, but drbd is not syncing the latest files I have created, even though drbd status tells me they are UpToDate.

sudo drbdadm status 
iba role:Primary
  disk:UpToDate

Pcs also does not show errors

sudo pcs status 
Cluster name: cluster_iba
Cluster Summary:
  * Stack: corosync
  * Current DC: iba2-ip192 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue Feb 22 18:16:20 2022
  * Last change:  Mon Feb 21 16:19:38 2022 by root via cibadmin on iba1-ip192
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ iba1-ip192 iba2-ip192 ]

Full List of Resources:
  * virtual_ip  (ocf::heartbeat:IPaddr2):    Started iba2-ip192
  * Clone Set: DrbdData-clone [DrbdData] (promotable):
    * Masters: [ iba2-ip192 ]
    * Slaves: [ iba1-ip192 ]
  * DrbdFS  (ocf::heartbeat:Filesystem):     Started iba2-ip192
  * WebServer   (ocf::heartbeat:apache):     Started iba2-ip192
  * Maria   (ocf::heartbeat:mysql):  Started iba2-ip192

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

All constraint:

sudo pcs constraint list --full
Location Constraints:
Ordering Constraints:
  promote DrbdData-clone then start DrbdFS (kind:Mandatory) (id:order-DrbdData-clone-DrbdFS-mandatory)
  start DrbdFS then start virtual_ip (kind:Mandatory) (id:order-DrbdFS-virtual_ip-mandatory)
  start virtual_ip then start WebServer (kind:Mandatory) (id:order-virtual_ip-WebServer-mandatory)
  start DrbdFS then start Maria (kind:Mandatory) (id:order-DrbdFS-Maria-mandatory)
Colocation Constraints:
  DrbdFS with DrbdData-clone (score:INFINITY) (with-rsc-role:Master) (id:colocation-DrbdFS-DrbdData-clone-INFINITY)
  virtual_ip with DrbdFS (score:INFINITY) (id:colocation-virtual_ip-DrbdFS-INFINITY)
  WebServer with virtual_ip (score:INFINITY) (id:colocation-WebServer-virtual_ip-INFINITY)
  Maria with DrbdFS (score:INFINITY) (id:colocation-Maria-DrbdFS-INFINITY)
Ticket Constraints:

The files in /mnt/datosDRBD in node iba2-ip192 (when it's the master),

/mnt/datosDRBD$ ls -l
total 80
-rw-r--r-- 1 root  root   5801 feb 21 12:16 drbd_cfg
-rw-r--r-- 1 root  root  10494 feb 21 12:18 fs_cfg
drwx------ 2 root  root  16384 feb 21 10:12 lost+found
drwxr-xr-x 4 mysql mysql  4096 feb 22 18:00 mariaDB
-rw-r--r-- 1 root  root  17942 feb 21 12:39 MariaDB_cfg
-rw-r--r-- 1 root  root      5 feb 21 10:13 testMParicio.txt
-rw-r--r-- 1 root  root  13578 feb 21 12:21 WebServer_cfg

And the files in /mnt/datosDRBD in node iba1-ip192 (when it's the master),

ls -l
total 92
-rw-r--r-- 1 root     root      5801 feb 21 12:16 drbd_cfg
drwxrwxrwx 5 www-data www-data  4096 feb 22 13:41 FilesSGITV
-rw-r--r-- 1 root     root     10494 feb 21 12:18 fs_cfg
drwx------ 2 root     root     16384 feb 21 10:12 lost+found
drwxr-xr-x 7 mysql    mysql     4096 feb 22 17:55 mariaDB
-rw-r--r-- 1 root     root     17942 feb 21 12:39 MariaDB_cfg
-rw-r--r-- 1 root     root         5 feb 22 17:58 testMParicio2.txt
-rw-r--r-- 1 www-data www-data     9 feb 22 17:58 testMParicio3.txt
-rw-r--r-- 1 root     root         5 feb 21 10:13 testMParicio.txt
-rw-r--r-- 1 root     root     13578 feb 21 12:21 WebServer_cfg

All new files, testMParicio2.txt testMParicio3.txt and the folder FilesSGITV are missing.

I do not know what to do. I am very lost.

I appreciate any help, thanks.

(EDIT)

My config for drbd, in both nodes...

cat /etc/drbd.conf 
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

And my *.res config, in both nodes too:

resource iba {
        device /dev/drbd0;
        disk /dev/md3;
                meta-disk internal;
                on iba1 {
                        address 10.0.0.248:7789;
                }
                on iba2  {
                        address 10.0.0.249:7789;
                }
}

drbdadm use iba1 and iba2, with IP 10.0.0.248 and 10.0.0.249

Corosync use iba1-ip192 and iba2-192, with IP 192.168.1.248 and 192.168.1.249

cat /etc/hosts
127.0.0.1 localhost
#127.0.1.1 iba1
10.0.0.248  iba1
10.0.0.249  iba2
192.168.1.248 iba1-ip192
192.168.1.249 iba2-ip192
cat /etc/drbd.d/global_common.conf


global {
    usage-count yes;
    
    udev-always-use-vnr; # treat implicit the same as explicit volumes

}

common {
    handlers {
    }

    startup {
    }

    options {
    }

    disk {
    }

    net {
        protocol C;
    }
}

(EDIT 2)

I have found a problem in /proc/drbd

In primary node:

cat /proc/drbd 
version: 8.4.11 (api:1/proto:86-101)
srcversion: FC3433D849E3B88C1E7B55C 
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:2284 dr:11625 al:6 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:42364728

in secondary node

cat /proc/drbd 
version: 8.4.11 (api:1/proto:86-101)
srcversion: FC3433D849E3B88C1E7B55C 
 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:36538580

Secondary node don't remember ssh key, fix with

ssh-keygen  -R 10.0.0.248
ssh-copy-id iba@iba1

But drbd still with StandAlone status.
I don't know how to continue


Solution

  • I have found a Split-Brain that did not appear in the status of pcs.

    sudo journalctl | grep Split-Brain
    feb 21 13:00:10 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
    feb 21 13:21:40 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
    feb 21 13:27:54 ibatec1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
    

    I have stopped the cluster, with --force on the master, Then... On split-brain victim (assuming the DRBD resource is iba):

    drbdadm disconnect iba
    drbdadm secondary iba
    drbdadm connect --discard-my-data iba
    

    On split-brain survivor:

    drbdadm primary iba
    drbdadm connect iba