javaapache-beamapache-beam-kafkaio

How to read Kafka record ingestion timestamp in Apache Beam


I am new to Apache Beam and struggling with this problem for a while. I am using KafkaIO as a source of my pipeline in Apache Beam Java . I want to fetch Kafka record ingestion timestamp along with every record and write that as an additional column to my output. The timestamp at which the record was ingested in Kafka and not the event time.

I am not able to figure out how to use kafkaIOReader without using the function withoutMetadata() . As far as I understand the Kafka record ingestion timestamp should be part of the metadata for each record ?


Solution

  • I you specify the timestamp policy, you should be able to access the timestamp of the resulting elements in your DoFn regardless of whether you're reading metadata. You can then do with this as you like (e.g. sticking it into a field of a POJO).