cassandrapycassa

Pycassa and Cassandra: doing a select based on columns only


I'm new to both technologies and I'm trying to do the following:

So far, the documentation says I should use the get method by using:

 family.get('rowid')

But I do not have the row ID. How would I run the above query?

Thanks


Solution

  • In general I think you're mixing two ideas. The query you've written is in CQL, and Pycassa doesn't support CQL (at least to my knowledge).

    However, in general regardless of used query interface, if you don't know the row key, you will have to create Secondary Indexes on the queried columns.

    You can do just that in Pycassa, consider following code fragment:

    from pycassa.columnfamily import ColumnFamily
    from pycassa.pool import ConnectionPool
    from pycassa.index import *
    from pycassa.system_manager import *
    
    sys = SystemManager('192.168.56.110:9160')
    
    try:
            sys.drop_keyspace('TestKeySpace')
    except:
            pass
    
    sys.create_keyspace('TestKeySpace', SIMPLE_STRATEGY, {'replication_factor': '1'})
    sys.create_column_family('TestKeySpace', 'mycolumnfamily')
    
    sys.alter_column('TestKeySpace', 'mycolumnfamily', 'column1', LONG_TYPE)
    sys.alter_column('TestKeySpace', 'mycolumnfamily', 'column2', LONG_TYPE)
    
    sys.create_index('TestKeySpace', 'mycolumnfamily', 'column1', value_type=LONG_TYPE, index_name='column1_index')
    sys.create_index('TestKeySpace', 'mycolumnfamily', 'column2', value_type=LONG_TYPE, index_name='column2_index')
    
    pool = ConnectionPool('TestKeySpace')
    col_fam = ColumnFamily(pool, 'mycolumnfamily')
    
    col_fam.insert('row_key0', {'column1': 10, 'column2': 20})
    col_fam.insert('row_key1', {'column1': 20, 'column2': 20})
    col_fam.insert('row_key2', {'column1': 30, 'column2': 20})
    col_fam.insert('row_key3', {'column1': 10, 'column2': 20})
    
    # OrderedDict([('column1', 10), ('column2', 20)])
    print col_fam.get('row_key0')
    
    ## Find using index: http://pycassa.github.io/pycassa/api/pycassa/
    column1_expr = create_index_expression('column1', 10)
    column2_expr = create_index_expression('column2', 20)
    
    clause = create_index_clause([column1_expr, column2_expr], count=20)
    
    for key, columns in col_fam.get_indexed_slices(clause):
            print "Key => %s, column1 = %d, column2 = %d" % (key, columns['column1'], columns['column2'])
    
    sys.close
    

    However maybe you can think if it's possible to design your data in a way that you can use row keys to query your data.