cassandracassandra-4.0

Bootstrap keeps failing, StreamingTombstoneHistogramBuilder returns "AssertionError: Invalid arguments: cell:43764 point:-2144718900 delta:1"


The bootstrap fails with following Stream Error.

Existing Cluster

6 Node - CentOS 7 - Cassandra version 4.0.11

The new node being added to the existing cluster - Ubuntu 22.04 LTS - Cassandra version 4.0.14

ERROR [Stream-Deserializer-/x.x.x.x:7000-7879f952] 2024-11-22 05:20:52,935 StreamSession.java:731 - [Stream #7d3711a0-a891-11ef-903f-c9da906269c9] Streaming error occurred on session with peer x.x.x.:7000
org.apache.cassandra.streaming.StreamReceiveException: java.lang.AssertionError: Invalid arguments: cell:43764 point:-2144718900 delta:1
    at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:60)
    at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
    at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
    at org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:172)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.AssertionError: Invalid arguments: cell:43764 point:-2144718900 delta:1
    at org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$Spool.tryCell(StreamingTombstoneHistogramBuilder.java:496)
    at org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$Spool.tryAddOrAccumulate(StreamingTombstoneHistogramBuilder.java:471)
    at org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.update(StreamingTombstoneHistogramBuilder.java:106)
    at org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.update(StreamingTombstoneHistogramBuilder.java:93)
    at org.apache.cassandra.io.sstable.metadata.MetadataCollector.updateLocalDeletionTime(MetadataCollector.java:218)
    at org.apache.cassandra.io.sstable.metadata.MetadataCollector.update(MetadataCollector.java:184)
    at org.apache.cassandra.db.rows.Rows.collectStats(Rows.java:106)
    at org.apache.cassandra.io.sstable.format.big.BigTableWriter$StatsCollector.applyToRow(BigTableWriter.java:284)
    at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:141)
    at org.apache.cassandra.db.ColumnIndex.buildRowIndex(ColumnIndex.java:117)
    at org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:216)
    at org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
    at org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.append(RangeAwareSSTableWriter.java:107)
    at org.apache.cassandra.db.streaming.CassandraStreamReader.writePartition(CassandraStreamReader.java:206)
    at org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:98)
    at org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84)
    at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
    ... 5 common frames omitted

Tried to bootstrap multiple times, it fails with the same error


Solution

  • I haven't come across this problem before so I inspected the stack trace for clues. This is the assertion in StreamingTombstoneHistogramBuilder$Spool.tryCell():

                assert cell >= 0 && point >= 0 && delta >= 0 : "Invalid arguments: cell:" + cell + " point:" + point + " delta:" + delta;
    

    This is making sure that none of the values to be stored in the tombstone histogram are negative. In this case, the assertion failed because point:-2144718900.

    Working down the stack trace, MetadataCollector.update() (line 218) confirmed my suspicion:

                estimatedTombstoneDropTime.update(newLocalDeletionTime);
    

    that point is the deletion time. It is stored as a Java int data type. Since it has a negative value, it means that it has been set too far in the future resulting in type overflow.

    Older versions of Cassandra cannot handle very large expiration times, limited to a maximum of 2038-01-19T03:14:06+00:00 (CASSANDRA-14092). If you upgraded your cluster from a version older than Cassandra 4.0, you need to read the announcement at the top of NEWS.txt which talks about the maximum TTL date expiration notice and what you need to do, specifically the detailed recovery instructions in CASSANDRA-14092.txt.

    Be aware that you are currently at risk of losing data so you need to make sure that you have offsite backups of your cluster. You will not be able to change the topology of your cluster (add or remove nodes) until you've fixed the underlying issue. Cheers!