network-programmingvirtualizationopennebula

OpenNebula host network change, lost all connectivity for VMs


I recently changed the underlying host network configuration (moving VLAN tagging from host to switch) and it seems to have completely blocked any sort of network connectivity for my VMs.

I run OpenNebula 5.4.6 on Ubuntu 16.04.

I have 4 physical network interfaces which used to be configured on the host as such:

auto br_admin
iface br_admin inet dhcp
    bridge_ports eno1
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br_service
iface br_service inet dhcp
    bridge_ports eno2
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto eno2.20
iface eno2.20 inet manual

auto eno2.30
iface eno2.30 inet manual

auto br_public
iface br_service inet dhcp
    bridge_ports eno2.20
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br_data
iface br_service inet dhcp
    bridge_ports eno2.30
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

I was able to move the VLAN bridges onto separate interfaces, not that the other two bridges went unmodified...

auto br_admin
iface br_admin inet dhcp
    bridge_ports eno1
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br_service
iface br_service inet dhcp
    bridge_ports eno2
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br_public
iface br_service inet dhcp
    bridge_ports eno3
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br_data
iface br_service inet dhcp
    bridge_ports eno4
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

None of my VMs use br_public or br_data, and they do not form any part of the OpenNebula config, so I was very shocked when I found my VMs had lost connectivity after this change. I rebooted all the VMs and later the host, but the problem persists.

I deleted and recreated the Virtual Networks within OpenNebula and detached the old nics, and adding new ones to the VMs. Even create entirely new VMs from scratch and I can't seem to get any network connection back.

Any ideas?? Thanks in advance...


Solution

  • After much debugging with tcpdump I discovered that my host bridge was receiving packets from the M, but not forwarding them on. I ran the below command and started getting network connectivity again.

    # echo "0" > /proc/sys/net/bridge/bridge-nf-call-iptables
    

    I checked and found the below in /etc/sysctl.d/50-bridge-nf-call.conf

    net.bridge.bridge-nf-call-arptables = 0
    net.bridge.bridge-nf-call-ip6tables = 0
    net.bridge.bridge-nf-call-iptables = 0
    

    Reading around I discovered a very old bug in Ubuntu that causes the sysctl config to be loaded 'too early' in the boot process, and therefore not being able to apply settings for modules that had not yet loaded. In the Examples section of Ubuntu's online sysctl.d man page explicitly explains how to use udev rules to set the bridge filtering once the module is properly loaded.

    Once I added this, I restarted the host and confirmed that the VMs would retain connectivity.