I've set up a nomad cluster on 2 VPS, one with nomad
server, consul
server and vault
, and another with nomad
and consul
client.
On the client node I'm trying to run nomad job
s, e.g. a postgres
service that should be reachable by other containers on an internal network. So I chose bridge
mode like so:
group "myservices" {
count = 1
network {
mode = "bridge"
port "postgrestcp" {
to = 5432
}
}
service {
name = "svc-postgres"
port = "postgrestcp"
tags = ["postgres","primary"]
...
task "task-postgres" {
driver = "docker"
config {
image = "docker.io/postgres:16-alpine"
ports = ["postgrestcp"]
}
...
When I start the job and exec
into the started container (nomad exec -task task-postgres -t $(nomad job status myjob-postgres | grep -A2 Alloc | tail -n1 | awk '{ print $1 }') bash
)
I find that all relevant NOMAD_ADDR...
env vars in the container contain my external interface IP instead of the IP of the default bridge (i.e. with the name nomad
). That bridge exists and has an IP which I expect to find in the mentioned env vars.
> networkctl status nomad | grep -e Type -e '^\s*Address:'
Type: bridge
Address: 172.26.64.1
No matter what I try I can't get rid of nomad
referring to the public IP. (I first thought it was because I was using the podman
task driver before, but changing to docker
didn't make a difference).
What's going on here?
In Nomad network specification, "bridge" network mode mean that there will be created a bridge between the tasks in the group. This is not "bridge" docker interface, it means something different. See https://developer.hashicorp.com/nomad/docs/networking#bridge-networking.
Nomad is redirecting the interface that was configured in the Nomad client service configuration. See https://developer.hashicorp.com/nomad/docs/configuration/client#host_network-block and https://developer.hashicorp.com/nomad/docs/job-specification/network#host_network . The default interface is the one used to connect, i.e. "external" one. I typically add also host_network "lo" { interface = "lo" }
to have localhost. Consider using firewall to prevent external stuff from reaching your services. Typically, having many servers, you want it to be an external port, so that other services can reach it.
Other containers can connect to the service using the IP and port it was assigned to. Within the group in the job use ${NOMAD_ADDR_postgrescp}
to connect to it. If not within the group, i.e. from other tasks and anything else, you have to extract that information from Nomad. You can:
postgrestcp.service.consul.your.domain.com
,nomad operator api /v1/job/yourservice/allocations | jq 'filter(.Status == "running") | .[0].ID' | xargs nomad alloc status -json | jq '.Resources.Networks | <find the port with postgrestcp label>'
nomad-port
part of https://pypi.org/project/nomad-tools/ that exactly does that - prints IP and port.In our network, we have fabio & consul. For HTTP services, we just pick a name, add it to urlprefix-name.service.consul.our.domain
to consul service tags within a job, and connect to the URL in the browser. For other services, most commonly templating is used to generate other services configuration, like airflow connecting to postgres or redis.