cassandra

Why are there different conditions in cassandra for when racks > replication factor and racks == replication factor?


Scenario: Number of Racks > RF (RF=2, 3 Racks) Racks: R1, R2, R3 Nodes: N1 (R1), N2 (R2), N3 (R3), N4 (R1), N5 (R2), N6 (R3)

Walking the ring:

Primary Replica: Placed on N1 (R1). Next Replica: Placed on N2 (R2)

Scenario: RF == Number of Racks (RF=3, 3 Racks) Racks: R1, R2, R3 Nodes: N1 (R1), N2 (R2), N3 (R3), N4 (R1), N5 (R2), N6 (R3)

Walking the ring:

Primary Replica: Placed on N1 (R1). Next Replica: Placed on N2 (R2) Next Replica: Placed on N3 (R3)

It seems in both scenarios you will ensure unique racks for each replica. I must be missing something for why the condition exists. https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java#L257


Solution

  • You are right that each case ensures unique racks.

    However, in the case that Racks = Replication, you also ensure that each piece of data will be exactly once in each rack. This makes allocation far easier because data will be spread evenly between the racks.

    In the case where racks > Replication, this is no longer true. In your example with RF=2 and Rack=3, each piece of data will be placed on a subset of racks. This makes spreading the data evenly a bit more complicated.

    How this relates to the strategy is as follows. With racks = RF, it allows to treat every node separately since the data is spread evenly just with the simply fact that each rack gets the same amount of data.

    However, in the racks > RF case, you have to group the nodes by rack to calculate a proper token distribution since placement matters a lot more.