[SOLVED] ChangeFeedProcessorBuilder checkpointing after unsuccessful processing

ChangeFeedProcessorBuilder checkpointing after unsuccessful processing

I was investigating the behavior of a ChangeFeedProcessorBuilder processor1 that throws an exception or goes down while processing the particular change. Upon recovery, the same change will not be picked up anymore. Is there any way to checkpoint only after the successful processing of the notification?

The delegate is as follows:

 var builder = container.GetChangeFeedProcessorBuilder("migrationProcessor",
                       (IReadOnlyCollection<object> input, CancellationToken cancellationToken) =>
                       {
                        Console.WriteLine(input.Count + " Changes Received by " + a);
                        // just first try will fail (static variable)
                        if (a++ == 0)
                           {
                               throw new Exception();
                           }
                           return Task.CompletedTask;
                       });

Thank you!

Solution

The default behavior of the Change Feed Processor is to checkpoint after a successful delegate execution: https://learn.microsoft.com/azure/cosmos-db/change-feed-processor#processing-life-cycle

The normal life cycle of a host instance is:

Read the change feed.
If there are no changes, sleep for a predefined amount of time (customizable with WithPollInterval in the Builder) and go to #1.
If there are changes, send them to the delegate.
When the delegate finishes processing the changes successfully, update the lease store with the latest processed point in time and go to #1.

If your delegate handler throws an unhandled exception, there is no checkpoint.

Adding from comments: The only scenario where the batch might not be retried is if the batch that throws is the first ever (lease has no Continuation). Because when the host picks up the lease again to reprocess, it has no point in time to retry from. Based on the official documentation, one lease is owned by a single instance, so there is no way that other instance could have picked up the same lease and be processing it in parallel (within the same Deployment Unit context).