javaamazon-web-servicesamazon-ecsaws-elasticsearch

SocketTimeoutException while searching documents on AWS ElaticSearch


Context

My team has a REST service running on AWS Elastic Container Service. It is connected with AWS Elastic Search to save the data.

The following is the Elastic Search client setup:

YAML Config:

elasticConfiguration:
  host: ${ELASTIC_SEARCH_HOST:-localhost}
  port: ${ELASTIC_SEARCH_PORT:-9200}
  scheme: ${ELASTIC_SEARCH_SCHEME:-http}
  username: ${ELASTIC_SEARCH_USERNAME}
  password: ${ELASTIC_SEARCH_PASSWORD}
  connectionTimeout: 3000
  socketTimeout: 3000
  maxConnections: 50

Bean Config:

    @Provides
    @Singleton
    public RestHighLevelClient elasticClient(final ElasticConfiguration elasticConfiguration) {
        final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        if (Optional.ofNullable(elasticConfiguration.getPassword()).map(String::trim).filter(Predicate.not(StringUtils::isNullOrEmpty)).isPresent()) {
            credentialsProvider.setCredentials(AuthScope.ANY,
                    new UsernamePasswordCredentials(elasticConfiguration.getUsername(), elasticConfiguration.getPassword()));
        }
        return new RestHighLevelClient(
                RestClient.builder(new HttpHost(elasticConfiguration.getHost(), elasticConfiguration.getPort(), elasticConfiguration.getScheme()))
                        .setHttpClientConfigCallback(
                                  httpClientBuilder -> httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
                                          .setMaxConnPerRoute(elasticConfiguration.getMaxConnections())
                                          .setMaxConnTotal(elasticConfiguration.getMaxConnections()))
                        .setRequestConfigCallback(
                                  requestConfigBuilder -> requestConfigBuilder.setConnectTimeout(elasticConfiguration.getConnectionTimeout())
                                          .setSocketTimeout(elasticConfiguration.getSocketTimeout()))
                        .setNodeSelector(NodeSelector.SKIP_DEDICATED_MASTERS));
    }

Problem:

Sometimes our team observes the following error in our log:

 exception:SocketTimeoutExceptionseverity:ERRORidentifier:Txn-4913e369-b046-4a72-bc8a-b25a722d62a7 - Req-a0797e45-d0d0-4a8d-a4b3-ace61b03010asource:stdoutthread:dw-469 - POST /cerebro/v1/index/store_products/documents/search?per_page=1message:Error in searching documents with message 3,000 milliseconds timeout on connection http-outgoing-834 [ACTIVE] and error
! java.net.SocketTimeoutException: 3,000 milliseconds timeout on connection http-outgoing-834 [ACTIVE]
! at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
! at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
! at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
! at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
! at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
! at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502)
! at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
! at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
! at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
! at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
! ... 1 common frames omitted
! Causing: java.net.SocketTimeoutException: 3,000 milliseconds timeout on connection http-outgoing-834 [ACTIVE]
! at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:850)
! at org.elasticsearch.client.RestClient.performRequest(RestClient.java:275)
! at org.elasticsearch.client.RestClient.performRequest(RestClient.java:262)
! at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1628)
! at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1598)
! at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1568)
! at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1084)
! at in.dreamplug.store.service.search.elastic.ElasticSearchService.searchDocuments(ElasticSearchService.java:116)
! at in.dreamplug.store.service.search.DocumentSearchService.searchDocuments(DocumentSearchService.java:78)
! at in.dreamplug.store.resource.DocumentSearchResource.searchDocuments(DocumentSearchResource.java:61)
! at jdk.internal.reflect.GeneratedMethodAccessor69.invoke(Unknown Source)
! at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
! at java.base/java.lang.reflect.Method.invoke(Method.java:568)

Potential Solutions Tried:

I tried two potential solutions:

  1. Connection Evictor Thread - It suggested periodically checking and closing the idle connections using a separate thread.
  2. Enable TCP keepalives - It suggested enabling the keep-alive flag and also reducing the values of net.ipv4.tcp_keepalive_time.

Unfortunately, I didn't see any major improvement. Could you please help me what else can be done?


Solution

  • I got in touch with the AWS support team.

    Team said my domain is facing continuous HTTPS 460 errors which were the requests that failed due to client connection drop-off, and that was happening because of lower socket timeout which is set up at 3 sec.

    Therefore, the team suggested doing the following setups:

    1. Increase the socket timeout to 30sec. Connection Timeout can still stay at 3sec.
    2. Add setSoKeepAlive(true) that enables TCP keepalive connections.

    I deployed these changes, and since then I haven't encountered the error.