cassandra

SELECT on a specific partition key does not return data


Problem Description:

We have a table in Cassandra version 3.11.7. After migrating three nodes to another availability zone, I encountered an issue where I cannot retrieve data by the primary key, despite the data being successfully inserted.

Table schema:

CREATE TABLE m_operational.app_settings (
    endpoint_id text PRIMARY KEY,
    format text,
    settings_json text,
    platform text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Select query that i trying to execute:

select * from m_operational.app_settings where endpoint_id='endpoint-i-need';

And I get nothing even after insert query:

INSERT INTO m_operational.app_settings (endpoint_id, format, settings_json, platform) VALUES( 'endpoint-i-need', 'free', '{""settings"": ""my-settings""}', 'PC');

tried to create a similar record next to it but with a different key - everything worked:

INSERT INTO m_operational.app_settings (endpoint_id, format, settings_json, platform) VALUES( 'endpoint-i-need1', 'free', '{""settings"": ""my-settings""}', 'PC');
select * from m_operational.app_settings where endpoint_id='endpoint-i-need1';

This problem started to arise during the migration of 3 Cassandra nodes from one availability zone to another and remains to this day (There are 9 nodes in the cluster).

The migration was carried out as follows:

After migration I execute nodetool cleanup on each of the nodes and then ran a repair with the command nodetool repair --full -pr -j 2 on each of the nodes. Nothing has changed and the select still doesn't work. Then I tried to run repair for a specific table on all nodes with nodetool repair --full -pr -j 2 m_operational app_settings - that didn't help either.

What can I look at in Cassandra to understand the cause of the problem? or how can i fix this problem?


Solution

  • My best guess is that someone deleted the partition with a timestamp set in the future so the tombstone has a later date than any of the INSERT statements you issue.

    This would be simple to prove with a bit of investigation. First is to identify the replicas which own the partition with:

    $ nodetool getendpoints -- m_operational app_settings <pkey>
    

    Then for each of the replicas, identify the SSTables which contains the partition with:

    $ nodetool getsstables -- m_operational app_settings <pkey>
    

    Go through the contents of each SSTable using sstabledump and look for a tombstone, particularly the deletion date/time:

    $ sstabledump -k <pkey> /path/to/data/.../...Data.db
    

    WARNING - sstabledump is an offline tool. Cassandra must be stopped before using the utility otherwise the node's normal operation may be compromised.

    When you identify the SSTable(s) that contains the "future" tombstone, you will need to move it (and its corresponding components like *-Statistics.db, *-Index.db, etc) out of the data directory since there is no way to remove tombstones with a deletion time set for the future. Cheers!