traefikconsulnomad

Nomad High Availability with Traefik


I've decided to give Nomad a try, and I'm setting up a small environment for side projects in my company.

Although the documentation on Nomad/Consul is nice and detailed, they don't reach the simple task of exposing a small web service to the world.

Following this official tutorial to use Traefik as a load balancer, how can I make those exposed services reachable?

The tutorial has a footnote stating that the services could be accessed from outside the cluster by port 8080.

But in a cluster where I have 3 servers and 3 clients, where should I point my DNS to? Should a DNS with failover pointing to the 3 clients be enough? Do I still need a load balancer for the clients?


Solution

  • There are multiple ways you could handle distributing the requests across your servers. Some may be more preferable than the other depending on your deployment environment.

    The Fabio load balancer docs have a section on deployment configurations which I'll use as a reference.

    Direct with DNS failover

    In this model, you could configure DNS to point to the IPs of all three servers. Clients would receive all three IPs back in response to a DNS query, and randomly connect to one of the available instances.

    If an IP is unhealthy, the client should retry the request to one of the other IPs, but clients may experience slower response times if a server is unavailable for an extended period of time and the client is occasionally routing requests to that unavailable IP.

    You can mitigate this issue by configuring your DNS server to perform health checking of backend instances (assuming it supports it). AWS Route 53 provides this functionality (see Configuring DNS failover). If your DNS server does not support health checking, but provides an API to update records, you can use Consul Terraform Sync to automate adding/removing server IPs as the health of the Fabio instances changes in Consul.

    Fabio behind a load balancer

    As you mentioned the other option would be to place Fabio behind a load balancer. If you're deploying in the cloud, this could be the cloud provider's LB. The LB would give you better control over traffic routing to Fabio, provide TLS/SSL termination, and other functionality.

    If you're on-premises, you could front it with any available load balancer like F5, A10, nginx, Apache Traffic Server, etc. You would need to ensure the LB is deployed in a highly available manner. Some suggestions for doing this are covered in the next section.

    Direct with IP failover

    Whether you're running Fabio directly on the Internet, or behind a load balancer, you need to make sure the IP which clients are connecting to is highly available.

    If you're deploying on-premises, one method for achieving this would be to assign a common loopback IP each of the Fabio servers (e.g., 192.0.2.10), and then use an L2 redundancy protocol like Virtual Router Redundancy Protocol (VRRP) or an L3 routing protocol like BGP to ensure the network routes requests to available instances.

    L2 failover

    Keepalived is a VRRP daemon for Linux. There can find many tutorials online for installing and configure in.

    L3 failover w/ BGP

    GoCast is a BGP daemon built on GoBGP which conditionally advertises IPs to the upstream network based on the state of health checks. The author of this tool published a blog post titled BGP based Anycast as a Service which walks through deploying GoCast on Nomad, and configuring it to use Consul for health information.

    L3 failover with static IPs

    If you're deploying on-premises, a more simple configuration than the two aforementioned solutions might be to configure your router to install/remove static routes based on health checks to your backend instances. Cisco routers support this through their IP SLA feature. This tutorial walks through a basic setup configuration http://www.firewall.cx/cisco-technical-knowledgebase/cisco-routers/813-cisco-router-ipsla-basic.html.

    As you can see, there are many ways to configure HA for Fabio or an upstream LB. Its hard to provide a good recommendation without knowing more about your environment. Hopefully one of these suggestions will be useful to you.