Do I understand correctly that Kafka Streams state store is co-located with the KS application instance? For example if my KS application is running in a Kubernetes pod, the state store is located in the same pod? What state store storage is better to use in Kubernetes - RocksDB or in-memory? How can the type of the state store be configured in the application?
This depends on your use case - sometimes you can accept an in-memory store when you have a small topic. However in most cases you'll default to a persistent stores. To declare one you'd do:
streamsBuilder.addStateStore(
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(
storeName
),
keySerde,
valueSerde
)
);
If you wish for a inMemoryStore replace the third line with inMemoryKeyValueStore
.
Running a KafkaStreams application in k8s has a few caveats. First of all just a pod is not enough. You'll need to run this as a stateful-set. In that case your pod will have a PersistentVolumeClaim mounted on your pod under a certain path. It's best to set your state.dir
property to point at a subfolder of that path. That way when your pod shuts down the volume is retained and when the pod comes back on it will have all of its store present.