cassandradatastaxdatastax-enterprisecassandra-2.1repair

Necessity of repair before Cassandra version upgrade


Our production version of DSE is 4.8.4(Cassandra 2.1.12). We run 3 nodes cluster with 256 vnodes per node, ~200GB data per node, RF=3. We are going to consistently migrate to the latest DSE version 5.1.1(Cassandra 3.10.0).

According to DataStax upgrade manual http://docs.datastax.com/en/upgrade/doc/upgrade/datastax_enterprise/upgdDSE50.html repair should be done before starting the upgrade. We don't use incremental repairs and to repair the entire cluster we ran full sequential repair on a single node. After 12 hours of running 100/768 token ranges are repaired, but cpu usage pretty high and number of sstables for one of our tables increases almost linearly. We have several issues with this table during normal operation as well and one of the upgrade reason is to replace existing DTCS with new TWCS compaction strategy.

We are concerned about long repair time duration and increasing resources utilization. So we want to know whether repair is 100% necessary before upgrade? What are consequences of not doing/doing it? If we are going to upgrade several versions consistently should we perform read repair after each upgrade?


Solution

  • Running of read repair before any node maintenance is required to prevent data loss. It's possible if the maintaining node exclusively owns some portion of data and it was totally broken during the maintenance.