cassandracassandra-2.2

Cassandra: Fixed number of rows in a table


I want to create a table with fixed number of rows (lets say N), where if N+1th row was added, then 1st row would be removed.

This is the table, I use for storage of last N best results from graph analysis:

CREATE TABLE IF NOT EXISTS lp_registry.best (
    value float, // best value for current graph
    verts int,   // number of vertices in graph
    edges int,   // number of edges in graph
    wid text,    // worker id       
    id timeuuid, // timeuuid
    PRIMARY KEY (wid, id)
) WITH CLUSTERING ORDER BY (id ASC);

I've read about expiring data at DataStax, but found only TTL expirations. So I decided to do it in following way.

My Approach A:

Everytime a new result is wanted to be added, id of oldest row is retrieved..

SELECT wid, id FROM lp_registry.best LIMIT 1;

..as well as current number of rows..

SELECT COUNT(*) FROM FROM lp_registry.best;

Consequently if count >= N, then the oldest row is removed and the newest is added...

BEGIN BATCH
    INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
    DELETE FROM lp_registry.best WHERE wid = ? AND id = ?;
APPLY BATCH;

This approach has problem with that first selects are not atomic operations together with the following batch. So if any other worker deleted oldest row between select and batch, or N was exceeded, then this wouldn't work.

My Approach B:

Same first steps ...

SELECT wid, id FROM lp_registry.best LIMIT 1;
SELECT COUNT(*) FROM FROM lp_registry.best;

Then try to delete oldest row again and again until success..

if count < N {
  INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
} else { 
  while not success {
    DELETE FROM lp_registry.best WHERE wid = ? AND id = ? IF EXISTS;
  }
  INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
}

In this approach there is still trouble with exceeding N in the database, before count < N is checked.

Can you point me to the right solution?


Solution

  • Here is my solution. At first we need to create a table that will store current number of rows...

    CREATE TABLE IF NOT EXISTS row_counter (
      rmax int,  // maximum allowed number of rows
      rows int,  // current number of rows
      name text, // name of table
      PRIMARY KEY (name)
    );
    

    Then initialize it for a given fixed-rows tables:

    INSERT INTO row_counter (name, rmax, rows) 
    VALUES ('best', 100, 0);
    

    These are the statements used in the following code:

    q1 = "SELECT rows, rmax FROM row_counter WHERE name = 'best'";
    q2 = "UPDATE row_counter SET rows = ? WHERE name = 'best' IF rows < ?";
    q3 = "SELECT wid, id FROM best LIMIT 1";
    q4 = "DELETE FROM best WHERE wid = ? AND id = ? IF EXISTS";
    q5 = "INSERT INTO best (vertex, value, verts, edges, wid, id) VALUES (?, ?, ?, ?, ?, now())";
    
    selectCounter = session.prepare(q1);
    updateCounter = session.prepare(q2);
    selectOldBest = session.prepare(q3);
    deleteOldBest = session.prepare(q4);
    insertNewBest = session.prepare(q5);
    

    Solution in Java:

    // Success indicator
    boolean succ = false;
    
    // Get number of registered rows in the table with best results
    Row row = session.execute(selectCounter.bind()).one();
    int rows = row.getInt("rows") + 1;
    int rmax = row.getInt("rmax");
    
    // Repeatedly try to reserve empty space in table
    while (!succ && rows <= rmax) {
      succ = session.execute(updateCounter.bind(rows, Math.min(rows, rmax))).wasApplied();
      rows = session.execute(selectCounter.bind()).one().getInt("rows") + 1;
    }
    
    // If there is not empty space in table, repeatedly try to make new empty space
    while (!succ) {
      row = session.execute(selectOldBest.bind()).one();
      succ = session.execute(deleteOldBest.bind(row.getString("wid"), row.getUUID("id"))).wasApplied();
    }
    
    // Insert new row
    session.execute(insertNewBest.bind(vertex, value, verts, edges, workerCode));