cassandra

Is there any doc which explains the behaviour of Cassandra CDC internal details?


I was working on POC where I need to parse Cassandra CDC logs and use it for some purpose. In latest Cassandra version 4.1 Cassandra CDC contains _cdc.idx file which points to durable writes in CommitLog. Is .idx file is per CommitLog file? Is there 1-1 correspondence? How does it work?

I was trying out some scenarios and found following things:

  1. Sometimes only idx file is present and no CommitLog is there.
[root@xxx cdc_raw]# ls
CommitLog-7-1682328691317_cdc.idx 
  1. Sometimes multiple CommitLog files are there and single idx file is there.
[root@xxx cdc_raw]# ls
CommitLog-7-1682328691317_cdc.idx  CommitLog-7-1682486101658.log  CommitLog-7-1682492980412.log
CommitLog-7-1682328691319.log      CommitLog-7-1682486101659.log  CommitLog-7-1682492980413.log

To which file this idx file offset points to? How does it work? I tried to search for specific answer on many sites including following https://cassandra.apache.org/doc/latest/cassandra/operating/cdc.html#:~:text=Change%20data%20capture%20(CDC)%20provides,the%20table%20or%20altering%20it%20) but couldn't figure out the one. Can somebody make me understand about idx-commitlog relation in case of Cassandra CDC.


Solution

  • Each CDC index file (*_cdc.idx) maps to a corresponding commitlog segment.

    The format for the filenames are:

    So the file CommitLog-7-1682328691317_cdc.idx (which has segment ID 1682328691317) is the index for commitlog CommitLog-7-1682328691317.log.

    Unfortunately there isn't a document that explains all the details. I have to confess that I haven't spent much time on the CDC feature so I had to do some research myself.

    I would recommend having a look at CASSANDRA-8844 which is the equivalent design document for CDC and CASSANDRA-12148 which discusses the implementation of the CDC index files.

    I also spent some time looking at the code to understand how it works.

    CommitLogDescriptor.java shows the filename convention for the segments and index files.

    For details of what is stored in the index files, have a look at CommitLogSegment.writeCDCIndexFile(). Cheers!