cassandradocker-composecqlsh

Init script for Cassandra with docker-compose


I would like to create keyspaces and column-families at the start of my Cassandra container.

I tried the following in a docker-compose.yml file:

# shortened for clarity
cassandra:
    hostname: my-cassandra
    image: my/cassandra:latest
    command: "cqlsh -f init-database.cql"

The image my/cassandra:latest contains init-database.cql in /. But this does not seem to work.

Is there a way to make this happen ?


Solution

  • We recently tried to solve a similar problem in KillrVideo, a reference application for Cassandra. We are using Docker Compose to spin up the environment needed by the application which includes a DataStax Enterprise (i.e. Cassandra) node. We wanted that node to do some bootstrapping the first time it was started to install the CQL schema (using cqlsh to run the statements in a .cql file just like you're trying to do). Basically the approach we took was to write a shell script for our Docker entrypoint that:

    1. Starts the node normally but in the background.
    2. Waits until port 9042 is available (this is where clients connect to run CQL statements).
    3. Uses cqlsh -f to run some CQL statements and init the schema.
    4. Stops the node that's running in the background.
    5. Continues on to the usual entrypoint for our Docker image that starts up the node normally (in the foreground like Docker expects).

    We just use the existence of a file to indicate whether the node has already been bootstrapped and check that on startup to determine whether we need to do that logic above or can just start it normally. You can see the results in the killrvideo-dse-docker repository on GitHub.

    There is one caveat to this approach. This worked great for us because in our reference application, we're only spinning up a single node (i.e. we aren't creating a cluster with more than one node). If you're running multiple nodes, you'll probably want to make sure that only one of the nodes does the bootstrapping to create the schema because multiple clients modifying the schema simultaneously can cause some issues with your cluster. (This is a known issue and will hopefully be fixed at some point.)