performancesolrsolrclouddih

SolrCloud DIH performance


Got Solr 6.4.2 running in SolrCloud and some doubts about indexing performance.

I am using MSSql as data source and newest JDBC driver for MSSQL.

When Solr is started as standalone my DataImport runs at 31250 docs/s When Solr is started as SolrCloud (2 replicas) my DataImport runs at 10000 docs/s

Is there any config parameter which have influence on this?


Solution

  • It is expected that indexing in SolrCloud would be slower than indexing in standalone Solr (it has to index into the replicas too, so there is additional network traffic and latencies, and there are other things SolrCloud has to do too), but you can do some things to make sure it goes as fast as possible:

    1. you can shard the index. Indexing into several shards should be faster (test diff. numbers, at some point it will be too many so don't go crazy)
    2. send your docs to the shard leader. Indexing is done at the leader first, so if you send a doc to the leader you will save some network traffic. Of course here you have little control if you are using DIH. Unless you customize your DIH setup and have several handlers, each one would index only the docs for a shard, and you call each hanlder on the shard node.