hyperlinkcassandradatabase-schemaschema-design

Valueless Column Technique in Cassandra - Database Schema


I'm using Cassandra 0.8.2

I am attempting to use the "valueless column" technique to set up my cassandra schema. The idea behind the valueless column is the following: The name of your column becomes the relevant information & the value of the "name/value" pair is empty. This is used to make queries faster - an example of denormalization. I want the name of the column to be the url of the back link. The row key is be a UUID of the target url of the back link. Is this even a good idea/schema design?

I'm using a very basic example to get the point of my question across. Here's what I have set up using the Cassandra-Cli:

create column family ArticleBackLinks 
with comparator = UTF8Type
and key_validation_class = UTF8Type
and default_validation_class = UTF8Type
and column_metadata = 
[
{column_name: www.arstechnica.com, validation_class: UTF8Type},        
{column_name: www.apple.com, validation_class:UTF8Type},         
{column_name: www.cnn.com, validation_class: UTF8Type},      
{column_name: www.stackoverflow.com, validation_class: UTF8Type}, 
{column_name: www.reddit.com, validation_class: UTF8Type}
];

I get the error:

Command not found: `create column family ArticleBackLink...

I think my error is due to the period I am using in the column_name. In short, I would like to know if some of you have come across better ways to use the "valueless column" idea in Cassandra? Any good/better examples of the valueless column technique? Is my idea even the right way to use the valueless column technique?

Thanks in advance guys.


Solution

  • I think Cassandra does not like the dot in column_name, the following works

    [default@stackoverflow] create column family ArticleBackLinks with
    ...     comparator = UTF8Type and
    ...     default_validation_class = UTF8Type and
    ...     column_metadata =
    ...     [
    ...     {column_name: 'www.arstechnica.com', validation_class: UTF8Type},
    ...     {column_name: 'www.apple.com', validation_class:UTF8Type},
    ...     {column_name: 'www.cnn.com', validation_class: UTF8Type},
    ...     {column_name: 'www.stackoverflow.com', validation_class: UTF8Type},
    ...     {column_name: 'www.reddit.com', validation_class: UTF8Type}
    ...     ];
    881b31f0-bc64-11e0-0000-242d50cf1ff7
    Waiting for schema agreement...
    ... schemas agree across the cluster
    

    By the way, since you are using Cassandra 0.8.2 you should leverage CQL

    So, statement like this will be helpful in future

    UPDATE <COLUMN FAMILY> [USING <CONSISTENCY> 
    [AND TIMESTAMP <timestamp>] [AND TTL <timeToLive>]] 
    SET name1 = value1, name2 = value2 WHERE <KEY> = keyname;
    

    Refer this


    updated: added more thoughts as comment asked

    It's a good idea to keep grouped information at one place. It adds on efficiency that Cassandra provides.

    For example, your case can have category as RowKey and urls be column_name. So, on your front end, you can display categorized view quickly, because you know that arstechnicia and stackoverflow comes under technology group which is a rowKey. It adds a tiny bit of extra work when you insert data.

    I use Cassandra 0.6.x, so sadly I can't tell a lot about secondary index that Cassandra 0.7.0+ supports. But supposedly, you can achieve what explained above by adding a column say, category, in the main CF whose index is held by ArticleBackLink and just query using CQL's select... where....

    You might look into secondary index that might vanish the need of having a new 'index CF`. You may want to look into these: