javasolrluceneperformance-testingsolrcloud

Apache Solr handle hundreds of thousands requests


We have a small search app in local context. For back services, we are using Apache Solr 6.6.2 for data index and storage. The front-end is in PHP with Apache2 webserver.

We have a server of 48 core and 96 GB RAM where these services are installed. The expected size of documents in index in about 200 Million and each document can have maximum 20 fields. Most fields are both indexed and stored.

The expected simultaneous requests can be hundreds of thousands at a time. So what will be the best configuration of Apache Solr to handle it? We have started Solr with 20 GB RAM and stress tested but it start to degrade performance near 100 users. Where is the problem? What is the optimal way for this issue.

We have also tested Solr in SolrCloud mode but the performance does not improve too much. We were expecting that if there will be some memory problem that their will be OOM exception but did not happen anything like that. We have just changed schema according to our requirement and change memory via command-line. All other setting are default.

Following are few references that we have consulted already

  1. https://wiki.apache.org/solr/SolrPerformanceProblems
  2. https://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/

Solution

  • We have 200 million records in each Collection and we have 200 collections. We have 5 servers and each server has 8 cores and 64 gb ram.

    I would suggest you break up your servers into multiple servers.

    Do the replication of the data on each server so that requests gets divided into multiple servers. The more the number of servers, the more you'll be able to responsd quicker.

    Note : Just understand the replication factor : 2F+1 formula where if you have 5 servers then 3 replicas atleast should be there. I'll suggest you to go with 5 replicas only (1 replica for each server)