pythoncachingcassandrathriftpycassa

Set keys and rows cache to a column family using pycassa?


I'd like to know if pycassa offers the possibility to set the keys_cached and rows_cached fields to a specific column family (or even a keyspace)? As shown here using the cassandra cli : http://www.datastax.com/docs/1.0/configuration/storage_configuration

I found the alter_column_family function that takes the key_cache_size argument in the doc: http://pycassa.github.io/pycassa/api/pycassa/system_manager.html

But when I check the cache values with pycassaShell after setting the key cache size of a column family ( http://pycassa.github.io/pycassa/assorted/pycassa_shell.html ), it still tells me:

Row Cache:                       None%
Key Cache:                       None%

And also there's no differences of performance/memory usage after either, and since alter_column_family takes **kwargs arguments, and apparently doesn't check them, call succeed with any argument name... So I think key_cache_size corresponds to nothing... And I didn't find a documentation of what are the possible optional arguments.

Here is cassandra log when it receive the alter_column_family call:

INFO 17:39:26,338 Update ColumnFamily '53c7deadcc9b10271a2df9f0/B' From org.apache.cassandra.config.CFMetaData@3fbd01a[cfId=2dd38542-82ea-381f-be51-44a30af61f24,ksName=53c7deadcc9b10271a2df9f0,cfName=B,cfType=Standard,comparator=org.apache.cassandra.db.marshal.IntegerType,comment=,readRepairChance=0.1,dclocalReadRepairChance=0.0,replicateOnWrite=true,gcGraceSeconds=864000,defaultValidator=org.apache.cassandra.db.marshal.DoubleType,keyValidator=org.apache.cassandra.db.marshal.IntegerType,minCompactionThreshold=4,maxCompactionThreshold=32,column_metadata={java.nio.HeapByteBuffer[pos=0 lim=3 cap=3]=ColumnDefinition{name=6b6579, validator=org.apache.cassandra.db.marshal.IntegerType, type=PARTITION_KEY, componentIndex=null, indexName=null, indexType=null}, java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]=ColumnDefinition{name=76616c7565, validator=org.apache.cassandra.db.marshal.DoubleType, type=COMPACT_VALUE, componentIndex=null, indexName=null, indexType=null}, java.nio.HeapByteBuffer[pos=0 lim=7 cap=7]=ColumnDefinition{name=636f6c756d6e31, validator=org.apache.cassandra.db.marshal.IntegerType, type=CLUSTERING_KEY, componentIndex=null, indexName=null, indexType=null}},compactionStrategyClass=class org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,compactionStrategyOptions={},compressionOptions={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=<null>,memtable_flush_period_in_ms=0,caching=KEYS_ONLY,defaultTimeToLive=0,speculative_retry=NONE,indexInterval=128,populateIoCacheOnFlush=false,droppedColumns={},triggers={}] To org.apache.cassandra.config.CFMetaData@11131f6f[cfId=2dd38542-82ea-381f-be51-44a30af61f24,ksName=53c7deadcc9b10271a2df9f0,cfName=B,cfType=Standard,comparator=org.apache.cassandra.db.marshal.IntegerType,comment=,readRepairChance=0.1,dclocalReadRepairChance=0.0,replicateOnWrite=true,gcGraceSeconds=864000,defaultValidator=org.apache.cassandra.db.marshal.DoubleType,keyValidator=org.apache.cassandra.db.marshal.IntegerType,minCompactionThreshold=4,maxCompactionThreshold=32,column_metadata={java.nio.HeapByteBuffer[pos=0 lim=7 cap=7]=ColumnDefinition{name=636f6c756d6e31, validator=org.apache.cassandra.db.marshal.IntegerType, type=CLUSTERING_KEY, componentIndex=null, indexName=null, indexType=null}, java.nio.HeapByteBuffer[pos=0 lim=3 cap=3]=ColumnDefinition{name=6b6579, validator=org.apache.cassandra.db.marshal.IntegerType, type=PARTITION_KEY, componentIndex=null, indexName=null, indexType=null}, java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]=ColumnDefinition{name=76616c7565, validator=org.apache.cassandra.db.marshal.DoubleType, type=COMPACT_VALUE, componentIndex=null, indexName=null, indexType=null}},compactionStrategyClass=class org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,compactionStrategyOptions={},compressionOptions={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=<null>,memtable_flush_period_in_ms=0,caching=KEYS_ONLY,defaultTimeToLive=0,speculative_retry=NONE,indexInterval=128,populateIoCacheOnFlush=false,droppedColumns={},triggers={}]
INFO 17:39:26,349 CFS(Keyspace='system', ColumnFamily='schema_columnfamilies') liveRatio is 5.344978165938865 (just-counted was 5.344978165938865).  calculation took 0ms for 25 cells
INFO 17:39:26,349 Enqueuing flush of Memtable-schema_keyspaces@2042821730(138/8832 serialized/live bytes, 3 ops)
WARN 17:39:26,350 setting live ratio to maximum of 64.0 instead of Infinity
INFO 17:39:26,351 CFS(Keyspace='system', ColumnFamily='schema_keyspaces') liveRatio is 64.0 (just-counted was 64.0).  calculation took 1ms for 0 cells
INFO 17:39:26,351 Writing Memtable-schema_keyspaces@2042821730(138/8832 serialized/live bytes, 3 ops)
INFO 17:39:26,368 Completed flushing var/lib/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-jb-79-Data.db (177 bytes) for commitlog position ReplayPosition(segmentId=1405696976943, position=238330)
INFO 17:39:26,373 Enqueuing flush of Memtable-schema_columnfamilies@200691258(1145/6120 serialized/live bytes, 25 ops)
INFO 17:39:26,373 Writing Memtable-schema_columnfamilies@200691258(1145/6120 serialized/live bytes, 25 ops)
INFO 17:39:26,381 Completed flushing var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-jb-74-Data.db (787 bytes) for commitlog position ReplayPosition(segmentId=1405696976943, position=238330)

There's an interesting caching=KEYS_ONLY argument, which seems in contradiction with pycassaShell output, but nothing about the size of this key cache size, and nothing about the row cache. I get the very same output whether I put key_cache_size=200000, 0 or None.

So does someone know how to do it through pycassa?


Solution

  • What version of Cassandra are you using? pycassa is a bit out of date when it comes to column family attributes (especially within pycassaShell).

    key_cache_size and row_cache_size were replaced by global options (key_cache_size_in_mb, row_cache_size_in_mb) in cassandra.yaml in Cassandra 1.1. The caching option is the one you should be setting per-column family, and it can be set to KEYS_ONLY, ROWS_ONLY, ALL, or NONE.

    As a side note, at this point you should generally be using cqlsh instead of cassandra-cli and the python CQL driver instead of pycassa (I maintain both), especially for new projects.