Using DSE 4.8.6 (C* 2.1.13.1218)
When I try adding a new node in a new datacenter, bootstraping / node rebuild is always interrupted by streaming errors.
Error example from system.log:
ERROR [STREAM-IN-/172.31.47.213] 2016-04-19 12:30:28,531 StreamSession.java:621 - [Stream #743d44e0-060e-11e6-985c-c1820b05e9ae] Remote peer 172.31.47.213 failed stream session.
INFO [STREAM-IN-/172.31.47.213] 2016-04-19 12:30:30,665 StreamResultFuture.java:180 - [Stream #743d44e0-060e-11e6-985c-c1820b05e9ae] Session with /172.31.47.213 is complete
There is about 500GB of data to be streamed to the new node. Boostrap or rebuild operation stream those from 4 different nodes on the other (main) DC.
When a streaming error occurs, all synced data is wiped (and I have to start over).
What I tried so far:
auto_boostrap: False
in cassandra.yaml
and manually run nodetool rebuild
streaming_socket_timeout_in_ms
and setting up more aggressive TCP Keep Alive values in my linux conf (following advice in the CASSANDRA-9440 ticket)phi_convict_threshold
(to the max)Any other things I should try ? I'm in the process of running nodetool scrub
on every failing node to see if this helps...
On the stream out node, these are the error messages:
ERROR [STREAM-IN-/172.31.45.28] 2016-05-11 13:10:43,842 StreamSession.java:505 - [Stream #ecfe0390-1763-11e6-b6c8-c1820b05e9ae] Streaming error occurred
java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) ~[na:1.7.0_80]
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.7.0_80]
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) ~[na:1.7.0_80]
at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51) ~[cassandra-all-2.1.14.1272.jar:2.1.14.1272]
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257) ~[cassandra-all-2.1.14.1272.jar:2.1.14.1272]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
and then:
INFO [STREAM-IN-/172.31.45.28] 2016-05-10 07:59:14,023 StreamResultFuture.java:180 - [Stream #ea1271b0-1679-11e6-917a-c1820b05e9ae] Session with /172.31.45.28 is complete
WARN [STREAM-IN-/172.31.45.28] 2016-05-10 07:59:14,023 StreamResultFuture.java:207 - [Stream #ea1271b0-1679-11e6-917a-c1820b05e9ae] Stream failed
ERROR [STREAM-OUT-/172.31.45.28] 2016-05-10 07:59:14,024 StreamSession.java:505 - [Stream #ea1271b0-1679-11e6-917a-c1820b05e9ae] Streaming error occurred
java.lang.AssertionError: Memory was freed
at org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:338) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
As answered in the Cassandra ticket CASSANDRA-11345, this issue was due to a big SSTable file (40GB) being transferred.
The transfer of said file takes more than 1 hour and by default streaming operations time out if an outgoing transfer takes more than 1 hour.
To change this default behavior you can set the streaming_socket_timeout_in_ms in the cassandra.yaml
configuration file to a large value (eg: 72000000 ms or 20 hours)