joincassandradatamodeldigg

Help me to get better understanding of Digg's Cassandra data model


http://about.digg.com/blog/looking-future-cassandra

I've found this article about Digg's move to Cassandra. But I didn't get the author's idea of Bucket for pair (user,item). Little more details on the idea would be helpful to me to understand the solution better.

Thanks


Solution

  • It sounds like they are using one row in a super column family per user with one super column per item; a subcolumn for an item super column represents a friend who dugg the item. At least in pycassa, this makes an insert as simple as:

    column_family.insert(user, {item: {friend: ''}})
    

    They could also have done this a couple of other ways, and I'm not sure which they chose.

    One is to use a standard column family, use a (user,item) combination for the row key, and use one column per friend who dugg the item:

    column_family.insert(user + item, {friend: ''})
    

    Another is to use a standard column family, use just (user) for the row key, and use an (item, friend) combination for the column name:

    column_family.insert(user, {item + friend: ''})
    

    Doesn't sound like this is what they used, but it's an acceptable option as well.