[SOLVED] what is the way for counting number of columns in row in Cassandra?

what is the way for counting number of columns in row in Cassandra?

what is best practice for counting number of columns? I tried find any way to do this simply but not found any information about this. I think that i can do some query for this, but it maybe too slow. Then i know about way with use counter column, but i have not any idea how it would work with columns with ttl. summary, i need some method or way in Hector for check that row has not more than 5 columns in one minute. Thank for you advices and sorry for my bad language and noob question.

Solution

Don't really think counters would be useful for this.

Are you trying to count the number of columns, or the number of "rows" inside of a single "multi partition row"? Say you're table looks like this:

create table foo ( id text, colid text, somethingcol text, othercol text, primary key (id, colid));

Are you trying to count the number of entries for id='something specific'?

Assuming that you are, you can add a timestamp column as the first clustering key like this:

create table foo ( id text, ts timestamp, colid text, somethingcol text, othercol text, primary key (id, ts, colid));

And run a query like: select count(*) from foo where id='theId' and [ts condition here];

If you're looking to aggregate to see if there is ANY case where there are more than five entries in any one minute, then a different structure might be appropriate (e.g. you have the minute as a clustering key and entries within it inside a SET column). Whether you can use that for your original purpose is something you'll need to check.

If you don't have too many entries in a partition, it might be possible to simple query the timestamps for a partition, and do the grouping + check client side. Spark is another possibility (maybe with Shark) if you want more sophisticated aggregation. Of course, that means having access to a bit more infra.

Does that help?

UPDATE: Cassandra maintains a timestamp for each cell automatically. You can access that in a query:

SELECT id, bar, baz, writetime(bar) from foo where ...;

Will have the last updated timestamp for the bar column for each entry. The timestamp is assigned by the coordinator unless the client specifies it during write [during insert, you can do a with timestamp=n if you wish to specify it]. It's worth noting this is the behaviour using CQL, not thrift.