pythoncassandracassandra-python-driver

How to set serverside timestamp when create new record using python driver for cassandra


I have cassandra model as below.

from uuid import uuid4
from uuid import uuid1

from cassandra.cqlengine import columns, connection
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.management import sync_table


class BaseModel(Model):
    __abstract__ = True

    id = columns.UUID(primary_key=True, default=uuid4)
    created_timestamp = columns.TimeUUID(primary_key=True,
                                         clustering_order='DESC',
                                         default=uuid1)
    deleted = columns.Boolean(required=True, default=False)

class OtherModel(BaseModel):
    __table_name__ = 'other_table'
    name = columns.Text(required=True, default='')



if __name__ == '__main__':
    connection.setup(hosts=['localhost'],
                     default_keyspace='test')
    sync_table(OtherModel)

    OtherModel.create(id='d43ca2c3-b670-4efc-afd7-b46ada88c3fc', name='test')

When I create record, it set created_timestamp of my sytem or from where I execute this code.

My system and cassandra server has different timestamp.

If I execute this where system time is 2017-01-13 10:20:30 then it set timestame as same. and if I again execute same from another system, where timestamp is 2017-01-13 10:20:20, then it set same.

When I run query like

select * from test.other_table where id=d43ca2c3-b670-4efc-afd7-b46ada88c3fc limit 1;

It should return me latest(last) record which inserted last, but as the system timestamp is different from where I inserted record, it gives first record which inserted first.


Solution

  • The schema created from the above python code is:

    CREATE TABLE test.other_table (
        id uuid,
        created_timestamp timeuuid,
        deleted boolean,
        name text,
        PRIMARY KEY (id, created_timestamp)
    ) WITH CLUSTERING ORDER BY (created_timestamp DESC)
    

    Given your example, created_timestamp is a constituent of the primary key, so there will be two distinct rows 10:20:30 and 10:20:20 in the table. With order set to DESC, your read query will return values in sorted order, largest first, or 2017-01-13 10:20:30. The order the rows are inserted doesn't matter because created_timestamp is a clustering column.

    If created_timestamp was not part of the primary key then Cassandra would return only the latest value. Cassandra has an internal cell timestamp generated by the coordinator which determines when a cell was inserted or updated. It's used by Cassandra's merge process during a read request to determine the last value inserted. You can not set this from client code, but you can view the upsert time with the CQL writetime() function.

    For example,

    select id, dateOf(created_timestamp), writetime(name) from other_table;
    

    will return:

     id                                   | system.dateof(created_timestamp) | writetime(name)
    --------------------------------------+----------------------------------+------------------
     d43ca2c3-b670-4efc-afd7-b46ada88c3fc |         2017-01-14 23:09:08+0000 | 1484435348108365
     d43ca2c3-b670-4efc-afd7-b46ada88c3fc |         2017-01-14 23:07:30+0000 | 1484435250481046
    

    If you wish to use the coordinator's timestamp, you will have to use a CQL statement instead of the object-mapper:

    import uuid
    from cassandra.cluster import Cluster
    
    cluster = Cluster()
    session = cluster.connect("test")
    
    stmt = session.prepare(
    """
        INSERT INTO test.other_table (id,created_timestamp) VALUES (?,now());
    """
    )
    session.execute(stmt,[uuid.uuid4()])