xquerymarklogic

What is the retry-count property of a request in MarkLogic?


I can't find much related documentation or settings in MarkLogic regarding the <retry-count> property of a request. I can see this value goes up sometimes and it looks like it comes from within MarkLogic. The requests are hanging pretty long on the app server, no timing out, the retry-count goes up to tens of retries. I have to kill the request from admin panel. There are no correlated deadlocks or other errors in the logs that I could investigate. There seems to be no setting controlling the retry-count.

In our case if the request is retried it produces bad data because a request can be retried multiple times, mid batch (we can process anything between 1 and 100 docs/request) causing reprocessing of already processed documents. This messes our data up, our code does not support this kind of mid-batch retries. The only way we can protect our data flows is by checking current request for the retry-count value and throw error if it is bigger than 0. Not perfect, but works.

Does anyone know why MarkLogic retries, internally, some requests? Can you turn the retrying off?


Solution

  • Internal retry are most often due to deadlocks.

    If you have multiple transactions that take a (shared) read lock and then at some point promote to an (exclusive) write lock - only one of those transactions will "win" and obtain the write lock. All of the other transactions with a read lock on that URI will be bumped and forced to retry, which will then see the result of the transaction that "won". MarkLogic will handle this automatically, and retry .

    When you said that you did not see correlated deadlocks in the logs, what log level are your ErrorLog set to?

    https://docs.marklogic.com/11.0/messages/XDMP-en/XDMP-DEADLOCK

    If your issue is deadlocks, then you should see it get worse (longer running transactions, maybe even some hitting limits and failing as unresolvable) if you increase the threads and size of batches, and conversely see things improve if you reduce the threads and batch size so that there are less transactions "fighting" and fewer retries that allow transactions to complete faster.

    There is a Group setting for Retry Timeout: https://docs.marklogic.com/admin:group-set-retry-timeout