cassandradatastaxeventual-consistency

What happens to the writetime when 2 rows with the same primary key are inserted at the same time in different nodes?


I wonder if the consistency level used in cassandra has an effect on the WriteTime (time a cell is inserted/updated in cassandra).

Context

If you have the following table...

CREATE TABLE IF NOT EXISTS table1 (
  field_1        INT,
  field_2        INT,
  field_3        INT,
  PRIMARY KEY (field_1)
);

And that you insert the following events (at the same time) in an environment with multiple cassandra nodes and a consistency level of ONE...

INSERT INTO table1(field_1, field_2, field_3) VALUES (1, 1, 1); //A
INSERT INTO table1(field_1, field_2, field_3) VALUES (1, 1, 2); //B

Then, you expect to have { field_1 = 1, field_2 = 1, field_3 = 2 } when querying the table (after the internal update of all the node), because the second insert did override the first one:

SELECT * FROM table1 WHERE field_1 = 1;

Issue

However, what happened in the cluster? Was the row inserted (as A) and then modified (as B)? Or was the row directly inserted (as B, without transiting by the A state)?

I'm asking that because the following command...

SELECT
  WRITETIME(field_2) as unchanged_cell,
  WRITETIME(field_3) as updated_cell 
FROM table1
WHERE field_1 = 1;

Returns the same value for unchanged_cell and updated_cell. And I was expected unchanged_cell to be < than updated_cell. Indeed, field_1 was inserted by A, and did not change in B (so it should have a creation time = to WriteTime(A)) and field_2 was inserted by A and then updated by B (so it should have a creation time = to WriteTime(B), so greater then WriteTime(A)).

Is it the normal scenario? Will it change if I use a consistency of QUORUM or ALL instead?


Solution

  • Two rows inserted at the same time, by definition, will have the same WRITETIME(). If you think about it, they were written at the same time so it just follows that they would have the same write time.

    Also, neither (1) the consistency level, nor (2) the coordinator for the write request have any bearing on the value of the write time since ALL mutations are sent to ALL replicas in ALL data centres.

    The coordinator is only responsible for "coordinating" the request by sending the mutations to all replicas and requiring a write acknowledgement from CL replicas so it doesn't matter which node was picked as the coordinator.

    As a side note, timestamp ties are rare unless you explicitly specify the same timestamp with USING TIMESTAMP ... in your application. In any case, for two columns with the same write timestamp, the larger value (based on bytes) "wins". Cheers!