Changefeed Processor options are well described here -
I have few questions on that -
leaseRenewInterval
: Suppose an instance could not renew its lease within 17s (default lease renew interval), will the lease be removed from that instance? Or feed will wait till leaseExpirationInterval
to remove the lease from it and give it a chance to reacquire lease within 60s?
Will leaseRenew
by default happens after checkpoint
, or both are independent? i.e. leaseRenew can happen on separate thread after leaserenewinterval
, while other thread is still working on a batch?
We have seen the error: failed to checkpoint for owner 'null' with continuation token.
How this can happen? Why owner can become null
?
We have also seen the exception LeaseLostException
. Can this happen even if the pod/instance is not down? We are not expecting any load balance as only 1 physical partition is there, but want our system to be fault tolerant, so we do have multiple instances running where all other except 1, will always wait for lease to acquire.
There are few instances where we can see, at the same time, 3 pods/instance having lease of same physical partition, or we can say, they acquired same lease. (We can have at max 1 Physical Partition, (TTL for document is 3 days and storage is less, so we are not expecting more than 1 physical partition)). How this can happen?
EDITS:
Current Settings:
leaseRenewInterval : 17s
leaseAcquireInterval: 13s
leaseExpirationInterval: 60s
feedPollDelay: 2s [only this is not the default]
ChangeFeed Processor version:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.8.0</version>
</dependency>
So, I can assume the CFP version is 4.8.0
Please share which CFP version you are using and what are the options. Normally, unless you are very certain what you are doing, I don't recommend changing any of the intervals.
EDIT: Based on the new information. I am not familiar with the Java CFP, but when the number of instances is higher than leases, load balancing a lease across other instances while not ideal, shouldn't be a problem, because the lease will still be owned and processed by 1 machine. The only recommendation I'd try is to use the latest maven package version. There are fixes on CFP on newer version (https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-java-v4#4140-2021-04-06), so try 4.15.0.