solrconcurrencysolrcloudoptimistic-concurrency

Optimistic concurrency issue in SolrCloud


I am using Solr v7.7.1 in cloud mode. I am facing an issue related to optimistic concurrency:

I have a nested document which can be updated concurrently multiple times before committing the updates. During the process of indexing, we fetch the document which we want to modify along with its _version_, modify it and then send it to solr along with the same _version_. If the update happens more than once before committing, the following error is thrown:

Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://1.2.3.4:8983/solr/mcollection_shard1_replica_n2: version conflict for 1111 expected=1645085633861910528 actual=1645090791527284737

In the above error, we are basically trying to index a document with id 1111 before a previous version of the document was indexed and committed. The solution for this problem is to simply commit all the updates and then again try indexing the new document. However, the solr is giving the same error with same version codes even after committing. What could possibly the issue?

A strange observation is that this problem is not faced when solr is not running in the cloud mode.


Solution

  • This seems to be a very specific issue with solr when we are using nested documents.

    While indexing a document, when _version_ is mentioned, the solr checks the version of the already existing latest document by doing a real-time get. The real-time get gets the data from update logs (which means that the data which is not yet open for search is also accessible). For this, solr does something like following:

    http://1.2.3.4:8983/solr/mcollection/get?id=1111

    Now if you have 2 nested documents where, in one document (doc1), parent has id=1111 and in other document(doc2), the child has id=1111, then it may be possible that solr might check version of doc2 when you intended to index doc1. This might be because solr still indexes all the documents in flat structure and doesn't consider parent-child relationship while doing real-time get.

    The solution to this is to make the id of parent and child documents different from each other.

    The bug has been reported: https://issues.apache.org/jira/browse/SOLR-13785