kubernetesapache-kafkaapache-kafka-streamsstateful

Kafka Streams state store - what kind of store to use when running in Kubernetes


Do I understand correctly that Kafka Streams state store is co-located with the KS application instance? For example if my KS application is running in a Kubernetes pod, the state store is located in the same pod? What state store storage is better to use in Kubernetes - RocksDB or in-memory? How can the type of the state store be configured in the application?


Solution

  • This depends on your use case - sometimes you can accept an in-memory store when you have a small topic. However in most cases you'll default to a persistent stores. To declare one you'd do:

    streamsBuilder.addStateStore(
                Stores.keyValueStoreBuilder(
                        Stores.persistentKeyValueStore(
                                storeName
                        ),
                        keySerde,
                        valueSerde
                )
        );
    

    If you wish for a inMemoryStore replace the third line with inMemoryKeyValueStore.

    Running a KafkaStreams application in k8s has a few caveats. First of all just a pod is not enough. You'll need to run this as a stateful-set. In that case your pod will have a PersistentVolumeClaim mounted on your pod under a certain path. It's best to set your state.dir property to point at a subfolder of that path. That way when your pod shuts down the volume is retained and when the pod comes back on it will have all of its store present.