multithreadingsolrsolrj

How does ConcurrentUpdateSolrClient handle update request?


My application intends to insert documents to Solr regularly. There are two considerations:

  1. Sending update request to Solr is key impact to the performance.
  2. Thread safe of transaction. The commit of SolrClient is not thread-safe(point this out if I'm wrong), this might cause serious problem when multiple users input document to Solr.

I found ConcurrentUpdateSolrClient is a candidate solution that it is thread-safe, and it has a queue to buffer and flush many documents in one connection. But I am confused when I test it. My question is,

  1. If I set the queue size, should I still need to commit?
  2. If I commit, even there is only one document in the queue, it still submit a http request to Solr. Can I make it working as message queue?

Solution

  • The SolrClient is thread-safe and you can share a SolrClient instance across multiple threads if your insert/update/delete are stick to one collection or core into the Solr instance.

    But Solr hasn't the transactions as you could imagine have in a classic RDBMS.

    You must be aware that if you have more SolrClient instances (in the same app or in different apps and servers) that concurrently updates a collection/core, the first client that sends a commit to that collection/core, commits all the updates done till that moment by every client.

    On the other hand, if a SolrClient instances sends a rollback, it rollbacks all the updates done (even by the other SolrClient clients).

    There are many strategies to updates concurrently documents in Solr, and to understand how the commit works in Solr I warmly recommend to read

    And if you're writing your own multithread application I have just to recommend to centralise the commits and rollbacks in one point.

    ConcurrentUpdateSolrClient buffers all added documents and writes them into open HTTP connections. This class is thread safe.

    Although any SolrClient request can be made with this implementation, it is only recommended to use ConcurrentUpdateSolrClient with /update requests. The class HttpSolrClient is better suited for the query interface.