cassandracommitcassandra-4.0

Does Cassandra startup times depend upon the number of commit log files?


I have restarted my Cassandra without draining the node, now when Cassandra started it. It took more than 20-25 min to start accepting client connection. As Cassandra was reading the commit log files.

So, does Cassandra startup time depend upon commit log files?

Note: Running a single node Cassandra v4.0.


Solution

  • Yes! If the node restarts without a “drain” operation, it will reconcile its data with the unprocessed commitlog files. Any data which was not yet committed to disk will be written at startup time.

    Apart from reconciliation from unprocessed commitlog files, is there any other factor that leads to increase in Cassandra boot up time?

    Well, there are several in-memory structures which need to be built at startup (like the index summary). Additional things like the prepared statement cache get loaded as well. Processing the commitlog can also trigger a compaction, which could also slow things down.

    So sure, there are some additional things which happen, but the processing of the commitlog is the most time consuming.

    Would it be possible to figure out a total time will be taken by cassandra to bootup based on the size of commitlog files?

    In theory, yes. But it depends on a lot of different factors based on the size of the files, platform and abstraction of the underlying disk hardware. I've seen it take 20 minutes for a node to start with 100 or so 8GB commitlog files. But that might be different for you. I'd watch the commitlog dir in another terminal/ssh session while it starts to get a feel for the time:

    watch -n5 "ls data/commitlog/ | wc -l"