I am new to Cassandra and I don't quite know if my data model is correct. I have tried to create it based on the queries I want to make in my application. I want to create and update book objects and I want to find books by author and by publish date. I am using the DataStax Node.js Driver for Cassandra (using Typescript) and here is my schema so far:
CREATE TABLE IF NOT EXISTS books_by_author (
author_id UUID,
book_id UUID,
book_name TEXT,
date_published TIMESTAMP,
PRIMARY KEY (author_id, date_published);
CREATE TABLE IF NOT EXISTS books (
book_id uuid PRIMARY KEY,
book_name text,
book_description TEXT,
date_published TIMESTAMP,
author_id uuid,
author_name TEXT,
+ many more columns for book details);
Making author_id and date_published as primary key I was able to make queries with the nodejs driver and with help from the DataStax documentation:
const q = cassandra.mapping.q;
const results = await this.bookMapper.find(
{
authorId: '1', datePublished: q.and(q.gte(start), q.lte(end)), // given timerange for publish date, works fine
},
docInfo,
options);
The above code works well; I can get the list of books by author and by specifying a date range when publised. The bookMapper is mapping both tables (books_by_author, books) so I am using it to make all my DB queries.
Then I ran into issues. I created a book in my application but I gave it the wrong publish date and I would like to change that. So, to see how it could be done, I created a unit test that saves a book to the DB, then tries to use bookMapper.update to update the book's datePublished property. Here's some pseudo code on what I tried to achieve:
const bookId = '123uuid';
const existingBook = new Book({
id: bookId,
name: 'The Book',
datePublished: '2020-07-03T13:00:00.000Z',
description: 'Book description',
author: {
id: '1',
name: 'A. Author',
}
});
... // insert existingBook to DB and read book details from DB using bookMapper.get({bookId})
const modifiedBook = new Book({
id: bookId,
name: 'The Book',
datePublished: '2020-07-02T13:00:00.000Z', // modified publish date
description: 'Modified book description', // modified the book description as well
author: {
id: '1',
name: 'A. Author',
}
});
await this.bookMapper.update(modifiedBook); // update the book
await this.bookMapper.get({bookId}); // returns the book with data from existingBook, not modifiedBook
await this.bookMapper.find(
{
authorId: '1', datePublished: q.and(q.gte(start), q.lte(end)),
},
docInfo,
options);
// query with author id, returns a list of 2 books, both the existingBook and modifiedBook ??
As you can see, the update actually created a new book row to the DB and now I have 2 books instead of 1. And I have no idea what is the correct way of updating that data. I tried to use batching:
let changes = [];
changes.push(this.bookMapper.batching.remove(exisitingBook));
changes.push(this.bookMapper.batching.insert(modifiedBook));
await this.mapper.batch(changes);
const book = await this.bookMapper.get({bookId});
--> book is null!
Using batching to remove and insert seems to work so that remove is the last call to DB, it doesn't matter in which order I add those statements to my changes array, and it removes the book causing my last get statement to return null.
I wanted to use batching to make the operation atomic. I don't want to end up in a situation where I first delete the existing book and then insert the new book in separate DB calls without batching because if some error occurs after delete but before insert, then I will have lost my book data from the DB.
My question: What is the correct way to update the book data when the updated property happens to be part of a primary key? Thank you.
This is a well known "feature" of Cassandra - in the batch the both statements are getting the same timestamp, so the DELETE
operation wins over the INSERT
. The only solution to fix that is to explicitly set timestamps for every operation, with timestamp for DELETE
lower than INSERT
. I'm not Node.js developer, so it how it should be looking in pseudo-code/CQL (Node.js mapper should support setting custom timestamp on statements):
TS=currentTimestampInMicroseconds
BEGIN BATCH
DELETE FROM table USING TIMESTAMP TS-1 WHERE PK = ... US;
INSERT INTO table (....) VALUES (....) USING TIMESTAMP TS;
APPLY BATCH;