This as reference, stream of profile updates stored in KTable object.
I am thinking about storing update of data that rarely updated. So if an instance crash and another instance will be build those data from scratch again, it is possible they will never get thos data again. Because they never be streamed again, or easy saying, very rarely.
The KTable is backed by a topic, so it would determine on what its retention + cleanup policies are.
If the cleanup policy is compact
, then each unique key is stored "forever", or until the broker runs out of space, whichever is sooner.
If you run multiple instances, then each KTable will hold onto a subset of data from the partitions it consumed from, each table will not have all the data.
If any instance crashes / moves without persistent storage configurations, it will need to read all data from the beginning of its changelog topic, but you can configure standby replicas to account for that scenario
More info at https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Management