cassandradockerkubernetesgocql

App container to cassandra node - one to one or?


I am using containers to run both app servers & Cassandra nodes.

When starting the app server container, I need to specify which Cassandra node(1..n) to connect to. How would you divide the workload?

  1. One app container to one or more Cassandra nodes(How many).
  2. One or more app container to one Cassandra node(How many).
  3. Many to many(How many).

This is for a production setup, 100 % uptime. Each data load from cassandra is small but many.

I should be scalable so I can put in more app containers - like in Kubernetes they have pods. Pods is a set of nodes that make up granules of the application.
Therefore I am looking for the best possible group of containers(Cassandra and App server) that will scale

Info: Kubernetes is a to expensive setup in the beginning. And while waiting for Docker Swarm to be in release state I will do this manually. Any insight is welcome?

Regards


Solution

  • Don't run the app container and Cassandra node inside of the same pod. You want to be able to scale your Cassandra cluster independently of your application.

    For the Cassandra side of things, I suggest:

    You will need to have DNS working in your Kubernetes cluster.

    The Cassandra Replication Controller

    cassandra-replication-controller.yml

    apiVersion: v1
    kind: ReplicationController
    metadata:
      labels:
        name: cassandra
      name: cassandra
    spec:
      replicas: 1
      selector:
        name: cassandra
      template:
        metadata:
          labels:
            name: cassandra
        spec:
          containers:
            - image: vyshane/cassandra
              name: cassandra
              env:
                # Feel free to change the following:
                - name: CASSANDRA_CLUSTER_NAME
                  value: Cassandra
                - name: CASSANDRA_DC
                  value: DC1
                - name: CASSANDRA_RACK
                  value: Kubernetes Cluster
                - name: CASSANDRA_ENDPOINT_SNITCH
                  value: GossipingPropertyFileSnitch
    
                # The peer discovery domain needs to point to the Cassandra peer service
                - name: PEER_DISCOVERY_DOMAIN
                  value: cassandra-peers.default.cluster.local.
              ports:
                - containerPort: 9042
                  name: cql
              volumeMounts:
                - mountPath: /var/lib/cassandra/data
                  name: data
          volumes:
            - name: data
              emptyDir: {}
    

    The Cassandra Service

    The Cassandra service is pretty simple. Add the thrift port if you need that.

    cassandra-service.yml

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        name: cassandra
      name: cassandra
    spec:
      ports:
        - port: 9042
          name: cql
      selector:
        name: cassandra
    

    The Cassandra Peer Discovery Service

    This is a headless Kubernetes service that provides the IP addresses of Cassandra peers via DNS A records. The peer service definition looks like this:

    cassandra-peer-service.yml

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        name: cassandra-peers
      name: cassandra-peers
    spec:
      clusterIP: None
      ports:
        - port: 7000
          name: intra-node-communication
        - port: 7001
          name: tls-intra-node-communication
      selector:
        name: cassandra
    

    The Cassandra Docker Image

    We extend the official Cassandra image thus:

    Dockerfile

    FROM cassandra:2.2
    MAINTAINER Vy-Shane Xie <shane@node.mu>
    ENV REFRESHED_AT 2015-09-16
    
    RUN apt-get -qq update && \
        DEBIAN_FRONTEND=noninteractive apt-get -yq install dnsutils && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    
    COPY custom-entrypoint.sh /
    ENTRYPOINT ["/custom-entrypoint.sh"]
    CMD ["cassandra", "-f"]
    

    Notice the custom-entrypoint.sh script. It simply configures the seed nodes by querying our Cassandra peer discovery service:

    custom-entrypoint.sh

    #!/bin/bash
    #
    # Configure Cassandra seed nodes.
    
    my_ip=$(hostname --ip-address)
    
    CASSANDRA_SEEDS=$(dig $PEER_DISCOVERY_DOMAIN +short | \
        grep -v $my_ip | \
        sort | \
        head -2 | xargs | \
        sed -e 's/ /,/g')
    
    export CASSANDRA_SEEDS
    
    /docker-entrypoint.sh "$@"
    

    Starting Cassandra

    To start Cassandra, simply run

    kubectl create -f cassandra-peer-service.yml
    kubectl create -f cassandra-service.yml
    kubectl create -f cassandra-replication-controller.yml
    

    This will give you a one-node Cassandra cluster. To add another node:

    kubectl scale rc cassandra --replicas=2
    

    Talking to Cassandra

    Your application pods can connect to Cassandra using the cassandra hostname. It points to the Cassandra service.

    Show me the code

    I made a GitHub repo with the above setup: Multinode Cassandra Cluster on Kubernetes.