Cassandra throwing NoHostAvailableException after 5 minutes of high IOPS run

I'm using datastax cassandra 2.1 driver and performing read/write operations at the rate of ~8000 IOPS. I've used pooling options to configure my session and am using separate session for read and write each of which connect to a different node in the cluster as contact point. This works fine for say 5 mins but after that I get a lot of exceptions like :

Failed with: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.0.1.123:9042 (com.datastax.driver.core.TransportException: [/10.0.1.123:9042] Connection has been closed), /10.0.1.56:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)))

Can anyone help me out here on what could be the problem?

The exception asks me to increase number of connections per host but how high a value can I set for this parameter ? Also I'm not able to set CoreConnectionsPerHost beyond 2 as it throws me exception saying 2 is the max.

This is how I'm creating each read / write session.

   PoolingOptions poolingOpts = new PoolingOptions();
           poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, 2);
           poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, 200);
           poolingOpts.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 128);
           poolingOpts.setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 2);
           cluster = Cluster
             .builder()
             .withPoolingOptions( poolingOpts )
             .addContactPoint(ip)
             .withRetryPolicy( DowngradingConsistencyRetryPolicy.INSTANCE )
             .withReconnectionPolicy( new ConstantReconnectionPolicy( 100L ) ).build();
           Session s =  cluster.connect(keySpace);

Solution

Your problem might not actually be in your code or the way you are connecting. If you say the problem is happening after a few minutes then it could simply be that your cluster is becoming overloaded trying to process the ingestion of data and cannot keep up. The typical sign of this is when you start seeing JVM garbage collection "GC" messages in the cassandra system.log file, too many small ones batched together of large ones on their own can mean that incoming clients are not responded to causing this kind of scenario. Verify that you do not have too many of these event showing up in your logs first before you start to look at your code. Here's a good example of a large GC event:

INFO [ScheduledTasks:1] 2014-05-15 23:19:49,678 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 2896 ms for 2 collections, 310563800 used; max is 8375238656

When connecting to a cluster there are some recommendations, one of which is only have one Cluster object per real cluster. As per the article I've linked below (apologies if you already studied this):

Use one cluster instance per (physical) cluster (per application lifetime)
Use at most one session instance per keyspace, or use a single Session and explicitly specify the keyspace in your queries
If you execute a statement more than once, consider using a prepared statement
You can reduce the number of network roundtrips and also have atomic operations by using batches

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/fourSimpleRules.html

As you are doing a high number of reads I'd most definitely recommend using setFetchSize also if its applicable to your code

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/cqlStatements.html

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/queryBuilderOverview.html

For reference heres the connection options in case you find it useful

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/connectionsOptions_c.html

Hope this helps.