I am trying to enable a multi cluster CockroachDB spawning 3 k8s clusters connected with Cilium Cluster Mesh. The idea of having a multi cluster CockroachDB is described on cockroachlabs.com - 1, 2. Given the fact that the article calls for a change in CoreDNS ConfigMap, instead of using Cilium global-services feels suboptimal.
Therefore the question arises, how to enable a multi cluster CockroachDB in a Cilium Cluster Mesh environment, using Cilium global services instead of hacking CoreDNS ConfigMap ?
With CockroachDB installed via helm, it deploys a StatefulSet with a carefully crafted --join
parameter. It contains FQDNs of CockroachDB pods that are to join the cluster.
The pod FQDNs come from service.discover that is created with clusterIP: None
and
(...) only exists to create DNS entries for each pod in the StatefulSet such that they can resolve each other's IP addresses.
The discovery service automatically registers DNS entries for all pods within the StatefulSet, so that they can be easily referenced
Can a similar discovery service or alternative be created for a StatefulSet running on a remote cluster ? So that with cluster mesh enabled, pods J,K,L in cluster Β could be reached from pods X,Y,Z in cluster Α by their FQDN ?
As suggested in create-service-per-pod-in-statefulset, one could create services like
{{- range $i, $_ := until 3 -}}
---
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
io.cilium/global-service: 'true'
service.cilium.io/affinity: "remote"
labels:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: dbs
app.kubernetes.io/name: cockroachdb
name: dbs-cockroachdb-remote-{{ $i }}
namespace: dbs
spec:
ports:
- name: grpc
port: 26257
protocol: TCP
targetPort: grpc
- name: http
port: 8080
protocol: TCP
targetPort: http
selector:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: dbs
app.kubernetes.io/name: cockroachdb
statefulset.kubernetes.io/pod-name: cockroachdb-{{ $i }}
type: ClusterIP
clusterIP: None
publishNotReadyAddresses: true
---
kind: Service
apiVersion: v1
metadata:
name: dbs-cockroachdb-public-remote-{{ $i }}
namespace: dbs
labels:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: dbs
app.kubernetes.io/name: cockroachdb
annotations:
io.cilium/global-service: 'true'
service.cilium.io/affinity: "remote"
spec:
ports:
- name: grpc
port: 26257
protocol: TCP
targetPort: grpc
- name: http
port: 8080
protocol: TCP
targetPort: http
selector:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: dbs
app.kubernetes.io/name: cockroachdb
{{- end -}}
So that they resemble the original service.discovery and service.public
However, despite the presence of cilium annotations
io.cilium/global-service: 'true'
service.cilium.io/affinity: "remote"
services look bound to the local k8s cluster, resulting in CockroachDB consisting of 3 instead of 6 nodes. (3 in cluster A + 3 in cluster B)
It does not matter how which service (dbs-cockroachdb-public-remote-X
, or dbs-cockroachdb-remote-X
) I use in my --join
command overwrite
join:
- dbs-cockroachdb-0.dbs-cockroachdb.dbs:26257
- dbs-cockroachdb-1.dbs-cockroachdb.dbs:26257
- dbs-cockroachdb-2.dbs-cockroachdb.dbs:26257
- dbs-cockroachdb-public-remote-0.dbs:26257
- dbs-cockroachdb-public-remote-1.dbs:26257
- dbs-cockroachdb-public-remote-2.dbs:26257
The result is the same, 3 nodes instead of 6.
Any ideas?
Apparently due to 7070, patching CoreDNS ConfigMap is the most reasonable thing we can do. In the comments of that bug, an article is mentioned, that provides additional context.
My twist to this story is that I updated the config map with kubernetes plugin config:
apiVersion: v1
data:
Corefile: |-
saturn.local {
log
errors
kubernetes saturn.local {
endpoint https://[ENDPOINT]
kubeconfig [PATH_TO_KUBECONFIG]
}
}
rhea.local {
...
So that I could resolve other names as well.
In my setup, each cluster has its own domain.local
. PATH_TO_KUBECONFIG
is a plane kubeconfig file. Generic secret has to be created in kube-system
namespace and the secret volume has to be mounted under coredns deployment.