bigdataapache-flinkapache-beamdataflowamazon-kinesis-analytics

Recalculate historical data using Apache Beam


I have an Apache Beam streaming project that calculates data and writes it to the database, what is the best way to reprocess all historical records after a bug fix or after changing the way it processes data without a big delay?


Solution

  • It is quite application dependent.

    For example, a straightforward approach if you are using Kafka (and all data is in there):