While performing Cassandra operations (Batch execution- insert and update operations on two tables) using spark job I am getting "All host(s) tried for query failed - com. datastax. driver. core. OperationTimedOutException" error.
Cluster information:
Cassandra 2.1.8.621 | DSE 4.7.1
spark-cassandra-connector-java_2.10 version - 1.2.0-rc1 | cassandra-driver-core version - 2.1.7
Spark 1.2.1 | Hadoop 2.7.1 => 3 nodes
Cassandra 2.1.8 => 5 nodes
Each node having 28 gb memory and 24 cores
While searching for it's solution I came across some discussions, which says you should not use BATCHES. Though I would like to find the root cause of this error.Also, How and from where to set/get "SocketOptions. setReadTimeout", as this timeout limit must be greater than the Cassandra requests timeout as per standard guideline and to avoid possible errors.
Is the request_timeout_in_ms and the SocketOptions. setReadTimeout same?Can anyone help me with this?
While performing Cassandra operations (Batch execution- insert and update operations on two tables) using spark job I am getting "All host(s) tried for query failed - com. datastax. driver. core. OperationTimedOutException" error.
Directly from the docs:
The most common cause of this is that Spark is able to issue write requests much more quickly than Cassandra can handle them. This can lead to GC issues and build up of hints. If this is the case with your application, try lowering the number of concurrent writes and the current batch size using the following options.
spark.cassandra.output.batch.size.rows spark.cassandra.output.concurrent.writes
or in versions of the Spark Cassandra Connector greater than or equal to 1.2.0 set
spark.cassandra.output.throughput_mb_per_sec
which will allow you to control the amount of data written to C* per Spark core per second.
you should not use BATCHES
This is not always true, the connector uses local token aware batches for faster reads and writes but this is tricky to get right in a custom app. In many cases async queries are better or just as good.
setReadTimeout
This is a DataStax java driver method. The connector takes care of this for you, no need to change it.