kuberneteshigh-availabilityetcdfault-tolerancek3s

Fault tolerance in HA k3s setup


I'm trying to setup an HA k3s cluster with embedded etcd as datastore, with some VMs running on a server and a few raspberry pis.

I think I understand all of the concepts behind kubernetes and k3s specifically, but there is one thing I do not understand, which is how many servers (etcd, control-plane) can go offline for the cluster to still function. I've tried finding more info on this topic this post, but nothing seems to answer my question:

If I have 3 servers in the cluster, will the cluster still function with only a single server online and the other 2 offline?


Solution

  • I know nothing about k3s and only a little about k8s, but from etcd perspective https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance is answering your question quite well.

    TL;DR;

    You need majority of servers in the cluster to be online for the cluster to work, thus you need at least 3 servers in a cluster to survive one server going offline.