I'm using Debezium engine to sync data from a MySQL database. Since I'm using Debezium Engine I'm using the org.apache.kafka.connect.storage.FileOffsetBackingStore
to record my current changes offset. I think my computer had a power outage recently which resulted in the corruption of my offset file. When I try to run my Debezium engine app now, I get this error from Debezium.
ERROR io.debezium.embedded.EmbeddedEngine - Unable to configure and start the 'org.apache.kafka.connect.storage.FileOffsetBackingStore' offset backing store
org.apache.kafka.connect.errors.ConnectException: java.io.StreamCorruptedException: invalid stream header: 00000000
at org.apache.kafka.connect.storage.FileOffsetBackingStore.load(FileOffsetBackingStore.java:86)
at org.apache.kafka.connect.storage.FileOffsetBackingStore.start(FileOffsetBackingStore.java:59)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:691)
at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:192)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.StreamCorruptedException: invalid stream header: 00000000
at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:987)
at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:414)
at org.apache.kafka.connect.util.SafeObjectInputStream.<init>(SafeObjectInputStream.java:48)
at org.apache.kafka.connect.storage.FileOffsetBackingStore.load(FileOffsetBackingStore.java:71)
... 8 common frames omitted
I'd like to fix this issue the proper way, by repairing the Debezium offset file but I don't know where to begin. I'm thinking I first need to figure out what offset I want, which would be the offset at which it was failing (or an offset before but near it). I may be able to get the offset from the corrupted file, but if not, can I use a timestamp to find a good offset to start at? It looks like I can use this tool to update the file to point to an offset of my choice (https://github.com/nathan-smit-1/HashmapEditor) once I know it. How, though, do I get a list of offsets in chronological order so I know which one I should change it to?
After a lot of trial and error and lots of online research I found the best solution to this was the following steps:
*-*-*-*-*-*-*-*-*-*-* UPDATE *-*-*-*-*-*-*-*-*-*-*
Debezium now has an offset editor as part of their source code. You can find it in the link below and works really well. Follow steps 1,2 & 3 above as before and for step 4 use this editor instead.
https://github.com/debezium/debezium-examples/tree/main/offset-editor