I've recently been getting the following error on the system.log files on both my production and demo clusters. Each cluster has 2 nodes and replication factor is 2. No changes have been made to my knowledge. I cannot figure out what the reason behind the error is. It is causing INSERT and UPDATE statements to fail.
[SharedPool-Worker-27] ERROR org.apache.cassandra.transport.Message - Unexpected exception during request; channel = [id: 0xeb429d31, /14.0.0.1:34495 => /14.0.0.2:9042]
java.lang.AssertionError: -2146739295
at org.apache.cassandra.db.BufferExpiringCell.<init>(BufferExpiringCell.java:46) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.db.BufferExpiringCell.<init>(BufferExpiringCell.java:39) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.db.AbstractCell.create(AbstractCell.java:176) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.UpdateParameters.makeColumn(UpdateParameters.java:65) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.Constants$Setter.execute(Constants.java:314) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:110) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:57) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.getMutations(ModificationStatement.java:708) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:495) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:481) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:493) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:138) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.10.jar:2.1.10]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.10.jar:2.1.10]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
These are async requests. On the client side I'm seeing a fail on the future as well.I'm using cassandra-2.1.10.I haven't done a rolling restart of the nodes yet, but I don't think that will fix the problem.
Also noticed that it seems like the failed inserts/updates happen after a few successful inserts/updates. The request statements themselves (formatting) are fine. Any help would be appreciated.
UPDATE: I've looked into the cassandra source code. It contains the following:
assert timeToLive > 0 : timeToLive;
assert localExpirationTime > 0 : localExpirationTime;
Looks like it's failing on the second assert statement. The table has a TTL value set in its properties for 1728000 seconds. No ttl is being set in the insert/update statements. So I don't understand why some of the statements are failing on this assert.
EDIT: on the client applications I see the following error messages:
Client 1 connects to cluster 1:
16:36:01.102 [New I/O worker #64] WARN - /14.0.0.2:9042 replied with server error (java.lang.AssertionError: -2146571535), trying next host
Client 2 connects to cluster 2:
16:30:01.302 [cluster1-nio-worker-7] WARN - /14.0.0.4:9042 replied with server error (java.lang.AssertionError: -2146571895), defuncting connection.
I believe what is happening is when the above errors happen the client drops the connection and reconnects. During this time other async requests fail.
One of the tables had its 'default_time_to_live' set to about 19 years. The reason behind the problem was the 2038 timestamp problmem. Even though the ttl value itself on each cell is the number of seconds left, seems like cassandra internally tries to convert the expiry time into a timestamp. So current timestamp + ttl (19+) years = a timestamp beyond 19 January, 2038. This was causing the overflow and the negative number seen in the exception shown above. Reducing the default ttl value on the table fixed the problem stopped the assertion error from happening.
Seemed like a few assertion errors, would cause the connections to reset and in the meanwhile other writes would fail.