amazon-web-servicesamazon-dynamodbamazon-kcl

Kinesis client library record processor failure


According to AWS docs:

The worker invokes record processor methods using Java ExecutorService tasks. If a task fails, the worker retains control of the shard that the record processor was processing. The worker starts a new record processor task to process that shard. For more information, see Read Throttling.

According to another page on AWS docs:

The Kinesis Client Library (KCL) relies on your processRecords code to handle any exceptions that arise from processing the data records. Any exception thrown from processRecords is absorbed by the KCL. To avoid infinite retries on a recurring failure, the KCL does not resend the batch of records processed at the time of the exception. The KCL then calls processRecords for the next batch of data records without restarting the record processor. This effectively results in consumer applications observing skipped records. To prevent skipped records, handle all exceptions within processRecords appropriately.

Aren't these 2 contradictory statements? One says that record processor restarts and another says that the shard is skipped. What does KCL exactly do when a record processor fails? How does a KCL worker comes to know if a record processor failed?


Solution

  • Based on my experience writing, debugging, and supporting KCL-based applications, the second statement is more clear/accurate/useful for describing how you should consider error handling.

    First, a bit of background:

    The first statement is trying (not very clearly) to say that when a KCL task fails, that instance of the worker will keep control of the shards it's processing (and not transfer them to another worker).