solrsolrj

How do I use SOLRJ ConcurrentUpdateSolrClient in a batch update processing of docs to SOLR?


I am trying to use SOLRJ's ConcurrentUpdateSolrClient to add many docs to Solr in batch mode. I read in Solr-In-Action that using this mode yields better performance than the HttpSolrClient method. But I cannot find any example usage beyond what is below, which is for a query and not an update. An example of how this fits in with using Javabin for the transfer would be very much appreciated. As for the the batch process, I am guessing that it would still be the same as the HttpSolrClient from the high level view by calling many client.add(doc) and then at some time call client.commit(). But there just does not seem to be any good examples of this despite the many times I've read that it is a good method for adding batch of docs to Solr.

   SolrClient client = new ConcurrentUpdateSolrClient.Builder("http://my-solr-server:8983/solr/core1").build();
   QueryResponse resp = client.query(new SolrQuery("*:*"));
 

Thank you for any help you can provide.


Solution

  • I found that the below code works but still not sure if this is all there is to it. Perhaps behind the scenes ConcurrentUpdateSolrClient is running more efficiently than the HttpSolrClient if I have a large collection of docs to add to Solr. Maybe additional parameters are needed in terms of setting up the client connection or that other client methods are needed before adding documents.

    SolrClient client_concurrentupdate = new ConcurrentUpdateSolrClient.Builder("http://192.168.1.100:8983/solr/test").build();
          SolrInputDocument doc_concurrent = new SolrInputDocument();
          doc_concurrent.addField("id", "1");
          doc_concurrent.addField("name", "Test of concurrent add");
          client_concurrentupdate.add(doc_concurrent);
          client_concurrentupdate.commit();
    

    From testing between the two methods for a test load of 1000 small documents for each method, the time taken for the concurrent method was 429ms while for the HTTP method is was 1493ms (measured after each client was built, so the results are for creating the doc, adding the fields, adding the doc and the commit for the block of 1000 docs).

    So it seems that there isn't much to do out of the box with the Concurrent method other than calling it in place of the HTTP method. However, if anyone has other answers or comments that would still be appreciated.