elasticsearchamazon-elasticsearchelasticsearch-jest

ElasticSearch querying using JestClient seems to be very slow


I have recently read about Elasticsearch, and I am using Jest to interact with Amazon Elasticsearch Service. I have been able to leverage Jest's documents and index data into the ES.

However, when I try to query using a boolean query, I see extremely high latencies. I tried performing a POST request using POSTMAN and I see the latencies to be much much lower.

Here's the example:

Jest Query: Given a key, value: Return a list of objects.

JestClient client:

String query = "{\n" +
        "    \"query\" : \n" +
        "        {\"bool\": \n" +
        "            { \"must\": \n" +
        "                [\n" +
        "                    {\"match\": \n" +
        "                        {\"" + key +"\" : \"" + value + "\"}\n" +
        "                    }\n" +
        "                ]\n" +
        "            }\n" +
        "        }\n" +
        "}";

long startTime, endTime;

Search search = new Search.Builder(query)
        // multiple index or types can be added.
        .addIndex(indexName)
        .addType(typeName)
        .build();
endTime = System.currentTimeMillis();
System.out.println("SearchBuilder: " + (endTime - startTime));

startTime = endTime;
JestResult result = client.execute(search);
endTime = System.currentTimeMillis();

System.out.println("ClientExecute: " + (endTime - startTime));

return result.getSourceAsObjectList(<Object>.class);

Output: SearchBuilder: 12 ClientExecute: 1193

On the other hand using POSTMAN: I have the POST request with the body:

{
    "query" : {"bool": { "must": [{"match": {key : value}}]}}
}

This is performed on: es.ap-southeast-1.es.amazonaws.com/index/_search Output:

"took": 1, "timed_out": false, "_shards": { "total": 10, "successful": 10, "failed": 0 },

I tried using Searchsourcebuilder as well. But to no avail. Am I using the right API?


Solution

  • This line

    "took": 1, "timed_out": false, "_shards": { "total": 10, "successful": 10, "failed": 0 }
    

    Tells you how long it took the actual ES engine to run the query, but it doesn't include any latency in sending the query or returning the result to you over the internet. In your JestClient example, you are infact including this time so it's entirely possible that while your JestClient example executes at the same speed, the time difference is simply about the time spent transmitting and receiving data.

    I'm unfamilar with Jest, but I've used Nest in C# (Which I assume is almost identical), and within the result, you should be able to get the same "took", "timed_out" stats within the return object.