cassandra

Why does my read operation go to SSTable when updated data is in Memtable?


I have data in the format of (id, data), such as (1, "someDataS").

However, I’m confused about what happens after updating older data that is already in the SSTable.

For example, if I update a data item that is currently in the SSTable, I expect the Memtable to hold the new version, while the older version remains in the SSTable. But when I perform a read after this update, it still checks the SSTable, even though a newer version should be in the Memtable.

Question: Why doesn’t the read operation return the updated data directly from the Memtable, where the latest version is stored? Is there a reason it still checks the SSTable?

I used query tracing feature to debug it, It led me to believe the relevant code is in following file https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java

more specific "queryMemtableAndSSTablesInTimestampOrder" method.To me it looks like, it always checks sstable.


Solution

  • In the wider user case - it is not necessarily possible to know from just the memtable that there is nothing within the sstable that you do not need.

    Examples:

    The last 2 specifically mean no scenario allows for a micro-optimisation based on the table schema where you can eliminate the potential of the first 2 scenarios.