cassandraakka-persistence

Akka persistence Cassandra, database initialisation


I am wondering what the most helpful pattern for initialising the akka-persistence database in Cassandra is.

At this time, I am using a custom Docker image with the database already created. However, I do not want to rely on docker and would like to use a standalone/non containerized instance.

The simplest pattern would be to connect to that database and execute the scripts manually before starting the application. But I would like to use a more automatic solution that will also work easily in local environments where the database can be deleted many times a day.

So I am wondering to use an initializing script/class that will connect to the database and execute the scripts. Sadly, I do not find any simple way to get the Cassandra connection from my (Playframework) application.

How do you initialize the database in your akka-persistence systems?


Solution

  • The first thing to note is that automation of Cassandra schema generation is something that should not be done in production, at least not in a situation where that automation could be done from multiple machines. This is because there's a very good chance that the keyspace(s, if also using Akka Persistence Cassandra as a snapshot store) will be simultaneously created on multiple nodes of the Cassandra cluster (even if using IF NOT EXISTS etc.) with different schema UUIDs. Cassandra will eventually reconcile them and determine that the different schemas are actually the same, but in the meantime, there will be unpredictable data loss.

    Akka Persistence Cassandra used to automatically create the keyspace/tables, but due to this issue, it no longer does. However, if configured (e.g. via application.conf or a JVM property), it will create the keyspace/tables. Just set akka.persistence.cassandra.journal.keyspace-autocreate and akka.persistence.cassandra.snapshot.keyspace-autocreate to true. This sets up the schema for local dev use (e.g. a replication factor of 1).

    You can set the journal's keyspace name with akka.persistence.cassandra.journal.keyspace and the replication factor to be used when auto-creating with akka.persistence.cassandra.journal.replication-factor.

    If you can be sure that you're only creating the schema from a single instance of the application at a time and will use nodetool describecluster to determine that there's schema agreement before starting any other instances, you can enable auto-creation in production: be careful if doing this to set the replication-factor properly.

    It's also possible to build a project just for creating the schema by depending on Akka Persistence Cassandra: the most straightforward approach is something like this:

    def main(args: Array[String]): Unit = {
      val actorSystem = ActorSystem("cassandra-tables")
    
      import actorSystem.dispatcher
    
      // check Cassandra docs on how to create a CqlSession:
      val cqlSession = ???
    
      val cassandraPluginSettings = akka.persistence.cassandra.PluginSettings(actorSystem)
    
      val cassandraStatements = new akka.persistence.cassandra.CassandraStatements(cassandraPluginSettings)
    
      val schemaCreationFuture =
        cassandraStatements.executeAllCreateKeyspaceAndTables(cqlSession, akka.event.Logging(actorSystem, getClass))
    
      schemaCreationFuture.foreach(_ => actorSystem.terminate())
    
      Await.ready(system.whenTerminated, Duration.Inf)
    }
    

    Because this uses a couple of classes which are private[akka], you'd need this main method to be in a package in the akka hierarchy, e.g. akka.gervaisb. Doing this sort of thing is effectively opting in to having to deal with changes whenever you upgrade APC, though it probably behooves one to look for schema changes when upgrading APC anyway.