I am trying to create a Kafka topology and break it down into more readable. I have a stream that I group by keys, and then I am trying to window it like so:
SessionWindowedKStream<byte[], byte[]> windowedTable =
groupedStream.windowedBy(SessionWindows.with(Duration.ofSeconds(config.joinWindowSeconds)).grace(Duration.ZERO));
KTable<Windowed<byte[]>, byte[]> mergedTable = windowedTable
.reduce((aggregateValue, newValue) -> {
try {
Map<String, String> recentMap = MAPPER.readValue(new String(newValue), HashMap.class);
Map<String, String> aggregateMap = MAPPER.readValue(new String(newValue), HashMap.class);
aggregateMap.forEach(recentMap::putIfAbsent);
newValue = MAPPER.writeValueAsString(recentMap).getBytes();
} catch (Exception e) {
LOG.warn("Couldn't aggregate key grouped stream\n", e);
}
return newValue;
}, Materialized.with(Serdes.ByteArray(), Serdes.ByteArray()));
mergedTable.toStream()
.foreach((externalId, eventIncidentByteMap) -> {
...
}
Unfortunately, the following exception is thrown:
00:40:11.344 [main] ERROR o.a.k.s.p.i.ProcessorStateManager - stream-thread [main] task [0_0] Failed to flush state store KSTREAM-REDUCE-STATE-STORE-0000000020:
org.apache.kafka.streams.errors.ProcessorStateException: Error opening store KSTREAM-REDUCE-STATE-STORE-0000000020.1589846400000 at location /tmp/kafka-streams/test-consumer/0_0/KSTREAM-REDUCE-STATE-STORE-0000000020/KSTREAM-REDUCE-STATE-STORE-0000000020.1589846400000
at org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:220)
at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:191)
at org.apache.kafka.streams.state.internals.KeyValueSegment.openDB(KeyValueSegment.java:49)
at org.apache.kafka.streams.state.internals.KeyValueSegments.getOrCreateSegment(KeyValueSegments.java:50)
at org.apache.kafka.streams.state.internals.KeyValueSegments.getOrCreateSegment(KeyValueSegments.java:25)
at org.apache.kafka.streams.state.internals.AbstractSegments.getOrCreateSegmentIfLive(AbstractSegments.java:84)
at org.apache.kafka.streams.state.internals.AbstractRocksDBSegmentedBytesStore.put(AbstractRocksDBSegmentedBytesStore.java:146)
at org.apache.kafka.streams.state.internals.RocksDBSessionStore.put(RocksDBSessionStore.java:81)
at org.apache.kafka.streams.state.internals.RocksDBSessionStore.put(RocksDBSessionStore.java:25)
at org.apache.kafka.streams.state.internals.ChangeLoggingSessionBytesStore.put(ChangeLoggingSessionBytesStore.java:74)
at org.apache.kafka.streams.state.internals.ChangeLoggingSessionBytesStore.put(ChangeLoggingSessionBytesStore.java:33)
at org.apache.kafka.streams.state.internals.CachingSessionStore.putAndMaybeForward(CachingSessionStore.java:90)
at org.apache.kafka.streams.state.internals.CachingSessionStore.lambda$initInternal$0(CachingSessionStore.java:73)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:151)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:109)
at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:124)
at org.apache.kafka.streams.state.internals.CachingSessionStore.flush(CachingSessionStore.java:230)
at org.apache.kafka.streams.state.internals.WrappedStateStore.flush(WrappedStateStore.java:84)
at org.apache.kafka.streams.state.internals.MeteredSessionStore.lambda$flush$5(MeteredSessionStore.java:227)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:806)
at org.apache.kafka.streams.state.internals.MeteredSessionStore.flush(MeteredSessionStore.java:227)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:282)
at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:177)
at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:554)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:490)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:478)
at org.apache.kafka.streams.TopologyTestDriver.completeAllProcessableWork(TopologyTestDriver.java:517)
at org.apache.kafka.streams.TopologyTestDriver.pipeRecord(TopologyTestDriver.java:472)
at org.apache.kafka.streams.TopologyTestDriver.pipeRecord(TopologyTestDriver.java:806)
at org.apache.kafka.streams.TestInputTopic.pipeInput(TestInputTopic.java:115)
at org.apache.kafka.streams.TestInputTopic.pipeInput(TestInputTopic.java:137)
at com.ro.revelon.pub.api.dp.EventConsumerTest.testEventWithIncident(EventConsumerTest.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: org.rocksdb.RocksDBException: You have to open all column families. Column families not opened: keyValueWithTimestamp
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:286)
at org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:217)
... 53 common frames omitted
I am not exactly sure if the issue is with the Serdes that aren't specified somewhere. I did use .groupByKey(Grouped.with(Serdes.ByteArray(), Serdes.ByteArray()))
when grouping by key. I suspect that I haven't mapped something properly along the way.
Caused by: org.rocksdb.RocksDBException: You have to open all column families. Column families not opened: keyValueWithTimestamp
is also suspicious and mysterious to me. Either way, I am not sure how to tackle the problem.
I know that the following code does works:
KTable<byte[], byte[]> mergedTable = groupedStream
.reduce((aggregateValue, newValue) -> {
try {
Map<String, String> recentMap = MAPPER.readValue(new String(newValue), HashMap.class);
Map<String, String> aggregateMap = MAPPER.readValue(new String(newValue), HashMap.class);
aggregateMap.forEach(recentMap::putIfAbsent);
newValue = MAPPER.writeValueAsString(recentMap).getBytes();
} catch (Exception e) {
LOG.warn("Couldn't aggregate key grouped stream\n", e);
}
return newValue;
}, Materialized.with(Serdes.ByteArray(), Serdes.ByteArray()));
mergedTable.toStream()
.foreach((externalId, eventIncidentByteMap) -> {
...
}
How do I break it down without tripping on the rocksdb store exception?
Did you downgrade your Kafka Streams library? In 2.3.0, the storage format was changed and this new storage format is not compatible to older Kafka Streams versions.
If you want to downgrade from a version 2.3.0 (or higher) to a version 2.2.x (or lower), you need to wipe out your local state first (eg, manually deleting the application state directory or via KafkaStreams#cleanup()
). On restart, the state will be rebuild from the changelog topic using the old storage format.