dnspodmanpodman-networking

Is intermittent DNS resolution failure using Podman with multiple containers to be expected?


Using (rootless) Podman with multiple containers that periodically restart, we have noticed intermittent DNS resolution failures (Temporary failure in name resolution) that we can only attribute to Podman's aardvark-dns service. The journald logs show that each time a container shuts down, aardvark-dns receives a SIGHUP:

The logs look like this:

Feb 19 22:04:46 our_host systemd[590631]: libpod-96f36b743fc774866fb779d8c39e8b111122223333b2c23291a25de1bb2184b9.scope: Consumed 1min 32.870s CPU time.
Feb 19 22:04:46 our_host podman[2159020]: @ - - [19/Feb/2024:22:03:54 +0100] "POST /v1.41/containers/96f36b743fc774866fb779d8c39e8b111122223333b2c23291a25de1bb2184b9/attach?stderr=1&stdin=1&stdout=1&stream=1 HTTP/1.1" 200 0 "" "Docker-Client/unknown-version (linux)"
Feb 19 22:04:46 our_host aardvark-dns[3621325]: Received SIGHUP will refresh servers: 1
Feb 19 22:04:46 our_host kernel: podman1: port 1(veth5) entered disabled state
Feb 19 22:04:46 our_host kernel: device veth5 left promiscuous mode
Feb 19 22:04:46 our_host kernel: podman1: port 1(veth5) entered disabled state
Feb 19 22:04:46 our_host podman[2159020]: @ - - [19/Feb/2024:22:03:54 +0100] "POST /v1.41/containers/96f36b743fc774866fb779d8c39e8b111122223333b2c23291a25de1bb2184b9/wait?condition=removed HTTP/1.1" 200 30 "" "Docker-Client/unknown-version (linux)"
Feb 19 22:04:46 Feb 19 22:04:51 our_host docker-compose[2159010]: java.net.UnknownHostException: www.somehost.com: Temporary failure in name resolution

From my limited understanding, it seems like when a container gets removed, the aardvark-dns service gets the SIGHUP signal in order to get it to update itself to remove the containers information (I'm guessing), and during that time is unable to respond to DNS requests by other containers.

I am having trouble believing that is the intended design, because even with 4-5 containers on a 5-10 minute schedule, we see multiple DNS failures a day, and I would imagine due to the quadratic growth that would make aardvark-dns practically unusable with ~10 restarting containers.

We have not set up any particular or specific configuration regarding DNS other than adding our own DNS servers to the hosts /etc/resolv.conf.


Solution

  • The answer we found was that no, that is not the expected behavior, but it is the actual one: there is a (currently) open bug on https://github.com/containers/aardvark-dns/issues/389 describing the details.

    The workaround we applied was to mount in a read-only /etc/resolv.conf, which will leave you unable to resolve other containers, but will use the specified resolution settings.

    Another suggested workaround was to use host networking, which again likely means you will be unable to resolve other containers. As we did not want to allow containers to use host networking, we did not explore this option further, so I can't speak to whether it would work.