I am using containers to run both app servers & Cassandra nodes.
When starting the app server container, I need to specify which Cassandra node(1..n) to connect to. How would you divide the workload?
This is for a production setup, 100 % uptime. Each data load from cassandra is small but many.
I should be scalable so I can put in more app containers - like in Kubernetes they have pods. Pods is a set of nodes that make up granules of the application.
Therefore I am looking for the best possible group of containers(Cassandra and App server) that will scale
Info: Kubernetes is a to expensive setup in the beginning. And while waiting for Docker Swarm to be in release state I will do this manually. Any insight is welcome?
Regards
Don't run the app container and Cassandra node inside of the same pod. You want to be able to scale your Cassandra cluster independently of your application.
For the Cassandra side of things, I suggest:
You will need to have DNS working in your Kubernetes cluster.
cassandra-replication-controller.yml
apiVersion: v1
kind: ReplicationController
metadata:
labels:
name: cassandra
name: cassandra
spec:
replicas: 1
selector:
name: cassandra
template:
metadata:
labels:
name: cassandra
spec:
containers:
- image: vyshane/cassandra
name: cassandra
env:
# Feel free to change the following:
- name: CASSANDRA_CLUSTER_NAME
value: Cassandra
- name: CASSANDRA_DC
value: DC1
- name: CASSANDRA_RACK
value: Kubernetes Cluster
- name: CASSANDRA_ENDPOINT_SNITCH
value: GossipingPropertyFileSnitch
# The peer discovery domain needs to point to the Cassandra peer service
- name: PEER_DISCOVERY_DOMAIN
value: cassandra-peers.default.cluster.local.
ports:
- containerPort: 9042
name: cql
volumeMounts:
- mountPath: /var/lib/cassandra/data
name: data
volumes:
- name: data
emptyDir: {}
The Cassandra service is pretty simple. Add the thrift port if you need that.
cassandra-service.yml
apiVersion: v1
kind: Service
metadata:
labels:
name: cassandra
name: cassandra
spec:
ports:
- port: 9042
name: cql
selector:
name: cassandra
This is a headless Kubernetes service that provides the IP addresses of Cassandra peers via DNS A records. The peer service definition looks like this:
cassandra-peer-service.yml
apiVersion: v1
kind: Service
metadata:
labels:
name: cassandra-peers
name: cassandra-peers
spec:
clusterIP: None
ports:
- port: 7000
name: intra-node-communication
- port: 7001
name: tls-intra-node-communication
selector:
name: cassandra
We extend the official Cassandra image thus:
Dockerfile
FROM cassandra:2.2
MAINTAINER Vy-Shane Xie <shane@node.mu>
ENV REFRESHED_AT 2015-09-16
RUN apt-get -qq update && \
DEBIAN_FRONTEND=noninteractive apt-get -yq install dnsutils && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
COPY custom-entrypoint.sh /
ENTRYPOINT ["/custom-entrypoint.sh"]
CMD ["cassandra", "-f"]
Notice the custom-entrypoint.sh
script. It simply configures the seed nodes by querying our Cassandra peer discovery service:
custom-entrypoint.sh
#!/bin/bash
#
# Configure Cassandra seed nodes.
my_ip=$(hostname --ip-address)
CASSANDRA_SEEDS=$(dig $PEER_DISCOVERY_DOMAIN +short | \
grep -v $my_ip | \
sort | \
head -2 | xargs | \
sed -e 's/ /,/g')
export CASSANDRA_SEEDS
/docker-entrypoint.sh "$@"
To start Cassandra, simply run
kubectl create -f cassandra-peer-service.yml
kubectl create -f cassandra-service.yml
kubectl create -f cassandra-replication-controller.yml
This will give you a one-node Cassandra cluster. To add another node:
kubectl scale rc cassandra --replicas=2
Your application pods can connect to Cassandra using the cassandra
hostname. It points to the Cassandra service.
I made a GitHub repo with the above setup: Multinode Cassandra Cluster on Kubernetes.