I'm preparing for a Server upgrade, but before doing so I want to have a dry-run within a VM first.
I'm running Linux Mint on a laptop. Currently I have FreeNAS v9.10.2-U6 installed within QEMU and RancherOS v1.5.6 installed into a VM via iohyve.
[laptop]
|_ [QEMU]
|_ [FreeNAS]
|_ [iohyve]
|_ [RancherOS]
I'm able to SSH into FreeNAS with no problem, but I can't SSH into Rancher. When trying to connect to Rancher it eventually times out. When I run the ssh
command with -vvv
it seems to hang on debug1: Connecting to <RANCHER_IP> [<RANCHER_IP>] port 22.
before eventually timing out.
This is what I've tried so far:
ping <RANCHER_IP>
ps -ef | grep sshd
netstat -nl | grep :22
iptables
rules on the Host and Guest and there doesn't appear to be a rule that would be blocking communication.This is my first time dealing with networking within nested VM's so I'm not certain if there's something simple I'm missing. I look forward to any insight the community may have.
TL;DR, I had to disable Hardware Offloading within the FreeNAS VM. For a persistent fix, within FreeNas' GUI I went to Init/Shutdown Scripts
and created a Post-Init
Command
script that ran
ifconfig vtnet0 -rxcsum -txcsum -rxcsum6 -txcsum6 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -tso -tso4 -tso6 -lro -vlanhwtso -vlanhwcsum
Full Troubleshooting Steps:
ifconfig | grep mtu
ifconfig | grep mtu
ifconfig | grep MTU
ping google.com
ping <FREENAS_IP>
ping <RANCHER_IP>
ping <HOST_IP>
ping <RANCHER_IP>
ping <HOST_IP>
ping <FREENAS_IP>
sshd
is running in the Rancher VM: ps -ef | grep sshd
sshd
: sudo system-docker restart console
in case there was some sort of race condition.netstat -nl | grep :22
.route
netstat -r
route
netstat
that just that IP and Port were being listened to. This was to rule out any possible port conflicts.iptables
rules on the Host and Rancher (FreeNAS doesn't have a firewall) and there weren't any rules that blocking communication.
ipfw table all list
.sudo tcpdump -nnvvS '(src <HOST_IP> and dst <RANCHER_IP>) or (src <RANCHER_IP> and dst <HOST_IP>)'
.
tcpdump: listening on ix0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:01:53.957264 IP (tos 0x0, ttl 64, id 56881, offset 0, flags [DF], proto TCP (6), length 60)
<HOST_IP>.60648 > <RANCHER_IP>.22: Flags [S], cksum 0xfae8 (correct), seq 468317589, win 64240, options [mss 1460,sackOK,TS val 2321761697 ecr 0,nop,wscale 7], length 0
sudo tcpdump -nnvvS '(src <HOST_IP> and dst <RANCHER_IP>) or (src <RANCHER_IP> and dst <HOST_IP>)'
tcpdump: listening on vtnet0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:59:03.029922 IP (tos 0x0, ttl 64, id 25421, offset 0, flags [DF], proto TCP (6), length 60)
<HOST_IP>.45688 > <RANCHER_IP>.22: Flags [S], cksum 0x8403 (incorrect -> 0x69a6), seq 3645881181, win 64240, options [mss 1460,sackOK,TS val 1007017042 ecr 0,nop,wscale 7], length 0
cksum
had incorrect
a lot, so I ran this on the Host ethtool --show-offload <ETHERNET_INTERFACE_NAME> | grep tx-checksumming
and it told me it was on. Ran sudo ethtool -K <ETHERNET_INTERFACE_NAME> tx off
to disable it, re-ran tcpdump
and ssh command, still got incorrect
for cksum
, so I renabled checksumming sudo ethtool -K <ETHERNET_INTERFACE_NAME> tx on
. At least I thought the last command reset things, after a reboot of FreeNAS the network was no longer reachable. I ended up running sudo ethtool --reset <ETHERNET_INTERFACE_NAME> all
, and eventually recreating the VM from scratch and rebooting my system to get things reset.iohyve tap0 or epair
of all things. Quoting the relevant info in case the post disappears at some point.
I ran into a very similar situation recently. I could ping the jails to & from bhyve guests but I could not pass any actual traffic. From other physical devices I had no issue passing traffic. The problem ended up being the hardware offloaders (TSO, HWSUM, etc) were causing the issue, which I found kind of ironic considering the traffic was not making it to the hardware in my case. I used
tcpdump
and could see the traffic had checksum errors. I turn off the hardware offloaders and everything started working, took me two weeks to figure this out. In hindsight I should of rantcpdump
on the first day.Try turning off the hardware offloading, then rerun
ifconfig -v
if it took effect, then test to see if you can pass actual traffic.Disable hardware offloading:
ifconfig igb0 -rxcsum -txcsum -rxcsum6 -txcsum6 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -tso -tso4 -tso6 -lro -vlanhwtso -vlanhwcsum
igb0
with vtnet0
), started the Rancher VM back up, and finally tried to SSH into Rancher... and succeeded. Basically my previous attempt to disable offloading was correct, but I needed to do it within FreeNAS, not the Host... which is a bit counter intuitive to me considering it's a bridged network and I'm passing my exact hardware resources through to the VMs.