We are experiencing query failures against our Apache Doris cluster. After our application runs for a period of time (several hours to a day), it begins to fail with the error Connection is not available, request timed out.
We initially tried to solve this by increasing the maximum size of our connection pool (connection_pool_max_size) to 1000. However, this only delayed the problem; the application still eventually consumes all available connections and fails. This strongly suggests a connection leak in our client application code.
I know this problem, you can fix it as follows: