[SOLVED] Pycassa and Cassandra: doing a select based on columns only

Pycassa and Cassandra: doing a select based on columns only

I'm new to both technologies and I'm trying to do the following:

select * from mytable where column = "col1" or column="col2"

So far, the documentation says I should use the get method by using:

 family.get('rowid')

But I do not have the row ID. How would I run the above query?

Thanks

Solution

In general I think you're mixing two ideas. The query you've written is in CQL, and Pycassa doesn't support CQL (at least to my knowledge).

However, in general regardless of used query interface, if you don't know the row key, you will have to create Secondary Indexes on the queried columns.

You can do just that in Pycassa, consider following code fragment:

from pycassa.columnfamily import ColumnFamily
from pycassa.pool import ConnectionPool
from pycassa.index import *
from pycassa.system_manager import *

sys = SystemManager('192.168.56.110:9160')

try:
        sys.drop_keyspace('TestKeySpace')
except:
        pass

sys.create_keyspace('TestKeySpace', SIMPLE_STRATEGY, {'replication_factor': '1'})
sys.create_column_family('TestKeySpace', 'mycolumnfamily')

sys.alter_column('TestKeySpace', 'mycolumnfamily', 'column1', LONG_TYPE)
sys.alter_column('TestKeySpace', 'mycolumnfamily', 'column2', LONG_TYPE)

sys.create_index('TestKeySpace', 'mycolumnfamily', 'column1', value_type=LONG_TYPE, index_name='column1_index')
sys.create_index('TestKeySpace', 'mycolumnfamily', 'column2', value_type=LONG_TYPE, index_name='column2_index')

pool = ConnectionPool('TestKeySpace')
col_fam = ColumnFamily(pool, 'mycolumnfamily')

col_fam.insert('row_key0', {'column1': 10, 'column2': 20})
col_fam.insert('row_key1', {'column1': 20, 'column2': 20})
col_fam.insert('row_key2', {'column1': 30, 'column2': 20})
col_fam.insert('row_key3', {'column1': 10, 'column2': 20})

# OrderedDict([('column1', 10), ('column2', 20)])
print col_fam.get('row_key0')

## Find using index: http://pycassa.github.io/pycassa/api/pycassa/
column1_expr = create_index_expression('column1', 10)
column2_expr = create_index_expression('column2', 20)

clause = create_index_clause([column1_expr, column2_expr], count=20)

for key, columns in col_fam.get_indexed_slices(clause):
        print "Key => %s, column1 = %d, column2 = %d" % (key, columns['column1'], columns['column2'])

sys.close

However maybe you can think if it's possible to design your data in a way that you can use row keys to query your data.