elasticsearchelasticsearch-performance

ElasticSearch BulkShardRequest failed due to org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor


I am storing logs into elastic search from my reactive spring application. I am getting the following error in elastic search:

Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of processing of [129010665][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logs-dev-2020.11.05][1]] containing [index {[logs-dev-2020.11.05][_doc][0d1478f0-6367-4228-9553-7d16d2993bc2], source[n/a, actual length: [4.1kb], max length: 2kb]}] and a refresh, target allocation id: WwkZtUbPSAapC3C-Jg2z2g, primary term: 1 on EsThreadPoolExecutor[name = 10-110-23-125-common-elasticsearch-apps-dev-v1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6599247a[Running, pool size = 2, active threads = 2, queued tasks = 221, completed tasks = 689547]]]

My index settings:

{
        "logs-dev-2020.11.05": {
        "settings": {
            "index": {
                "highlight": {
                    "max_analyzed_offset": "5000000"
                },
                "number_of_shards": "3",
                "provided_name": "logs-dev-2020.11.05",
                "creation_date": "1604558592095",
                "number_of_replicas": "2",
                "uuid": "wjIOSfZOSLyBFTt1cT-whQ",
                "version": {
                "created": "7020199"
                }
            }
        }
    }
}

I have gone through this site:

https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster

I thought adjusting "write" size in thread-pool will resolve, but it is mentioned as not recommended in the site as below:

Adjusting the queue sizes is therefore strongly discouraged, as it is like putting a temporary band-aid on the problem rather than actually fixing the underlying issue.

So what else can we do improve the situation?

Other info:


Solution

  • While you are right, that increasing the thread_pool size is not a permanent solution, you will be glad to know that elasticsearch itself increased the size of write thread_pool(use in your bulk requests) from 200 to 10k in just a minor version upgrade. Please see the size of 200 in ES 7.8, while 10k of ES 7.9 .

    If you are using the ES 7.X version, then you can also increase the size to if not 10k, then at least 1k(to avoid rejecting the requests).

    If you want a proper fix, you need to do the below things

    1. Find out if it's consistent or just some short-duration burst of write requests, while gets cleared in some time.
    2. If it's consistent, then you need to figure out if have all the write optimization is in place, please refer to my short-tips to improve index speed.
    3. See, if you have reached the full-capacity of your data-nodes, and if yes, scale your cluster to handle the increased/legitimate load.