nomad woes: NOMAD_IP_myport is my external interface IP, even though I'm using network.mode = "bridge"

I've set up a nomad cluster on 2 VPS, one with nomad server, consul server and vault, and another with nomad and consul client.

On the client node I'm trying to run nomad jobs, e.g. a postgres service that should be reachable by other containers on an internal network. So I chose bridge mode like so:

    group "myservices" {
        count = 1

        network {
            mode = "bridge"
            port "postgrestcp" {
                to = 5432 
            }
        }

        service {
            name = "svc-postgres"
            port = "postgrestcp"
            tags = ["postgres","primary"]
   ...
        task "task-postgres" {
            driver = "docker"

            config {
                image = "docker.io/postgres:16-alpine"
                ports = ["postgrestcp"]
            }
   ...

When I start the job and exec into the started container (nomad exec -task task-postgres -t $(nomad job status myjob-postgres | grep -A2 Alloc | tail -n1 | awk '{ print $1 }') bash)

I find that all relevant NOMAD_ADDR... env vars in the container contain my external interface IP instead of the IP of the default bridge (i.e. with the name nomad). That bridge exists and has an IP which I expect to find in the mentioned env vars.

> networkctl status nomad | grep -e Type -e '^\s*Address:'
                          Type: bridge
                       Address: 172.26.64.1

No matter what I try I can't get rid of nomad referring to the public IP. (I first thought it was because I was using the podman task driver before, but changing to docker didn't make a difference).

What's going on here?

Solution

In Nomad network specification, "bridge" network mode mean that there will be created a bridge between the tasks in the group. This is not "bridge" docker interface, it means something different. See https://developer.hashicorp.com/nomad/docs/networking#bridge-networking.

Nomad is redirecting the interface that was configured in the Nomad client service configuration. See https://developer.hashicorp.com/nomad/docs/configuration/client#host_network-block and https://developer.hashicorp.com/nomad/docs/job-specification/network#host_network . The default interface is the one used to connect, i.e. "external" one. I typically add also host_network "lo" { interface = "lo" } to have localhost. Consider using firewall to prevent external stuff from reaching your services. Typically, having many servers, you want it to be an external port, so that other services can reach it.

Other containers can connect to the service using the IP and port it was assigned to. Within the group in the job use ${NOMAD_ADDR_postgrescp} to connect to it. If not within the group, i.e. from other tasks and anything else, you have to extract that information from Nomad. You can:

register the service in consul or nomad providers, see https://developer.hashicorp.com/nomad/docs/job-specification/service
- and generate template with configuration pointing to the service, see https://developer.hashicorp.com/nomad/docs/job-specification/template#nomad-services and https://developer.hashicorp.com/nomad/docs/job-specification/template#consul-services .
or register the service in consul, integrate consul DNS to your system (see https://developer.hashicorp.com/consul/tutorials/networking/dns-forwarding), and integrate a proxy like fabio or traefik, see https://developer.hashicorp.com/nomad/tutorials/load-balancing/load-balancing-fabio, extract IP and port using DNS from consul
- connect to like postgrestcp.service.consul.your.domain.com,
- extract assigned service port also using DNS from consul, see https://developer.hashicorp.com/consul/docs/services/discovery/dns-static-lookups#rfc-2782-lookup .
- note: some services support RFC 2782 DNS SRV by themselves, no idea about postgres.
or other services could also straight interface with Nomad API to find the assigned IP and port to the service
- something along nomad operator api /v1/job/yourservice/allocations | jq 'filter(.Status == "running") | .[0].ID' | xargs nomad alloc status -json | jq '.Resources.Networks | <find the port with postgrestcp label>'
- or I created nomad-port part of https://pypi.org/project/nomad-tools/ that exactly does that - prints IP and port.

In our network, we have fabio & consul. For HTTP services, we just pick a name, add it to urlprefix-name.service.consul.our.domain to consul service tags within a job, and connect to the URL in the browser. For other services, most commonly templating is used to generate other services configuration, like airflow connecting to postgres or redis.