I want to use the key-value pair feature of Cassandra. Until now, I have been using Kyotocabinet but it does not support multiple writes and hence, I want to use Cassandra for versioning my tabular data.
Roll No, Name, Age, Sex
14BCE1008, Aviral, 22, Male
14BCE1007, Shantanu, 22, Male
The above data is tabular(csv). It's version 1. Next is version 2:
Roll No, Name, Age, Sex
14BCE1008, Aviral, 22, Male
14BCE1007, Shantanu, 22, Male
14BCE1209, Piyush, 22, Male
Hence, I would call the above version as version 2 with the following diff:
insert_patch
: 14BCE1209
as key(PK) and 14BCE1209, Piyush, 22, Male
as value.
I am familiar with the creation of the table but unable to figure out the versioning part.
To have multiple versions of data in your table if you use composite primary key instead of primary key consisting of one field.
So table definition could look as following (if you "know" the version number prior inserting the data):
create table test(
id text,
version int,
payload text,
primary key (id, version)
) with clustering order by (version desc);
and inserting data as:
insert into test (id, version, payload) values ('14BCE1209', 1, '....');
insert into test (id, version, payload) values ('14BCE1209', 2, '....');
to select the latest value for given key you can use LIMIT 1
:
SELECT * from test where id = '14BCE1209' LIMIT 1;
and to select latest versions for all partitions (not recommended, just for example - need a special approach for effective processing):
SELECT * from test PER PARTITION LIMIT 1;
But this will work only for cases when you know version in advance. If you don't know, then you can use timeuuid
type for version instead of the int
:
create table test(
id text,
version timeuuid,
payload text,
primary key (id, version)
) with clustering order by (version desc);
and inserting data as (instead of now()
you can use current timestamp generated from your code):
insert into test (id, version, payload) values ('14BCE1209', now(), '....');
insert into test (id, version, payload) values ('14BCE1209', now(), '....');
and select will work the same as above...