database-designnosqlcouchbasedocument-oriented-db

Designing record keys for document-oriented database - best practices


Our team has started development of an application backed by Couchbase DB; for every one of us it's the first experience with a no-SQL database.

We've started to define our entities and adopted the practice of using "type" prefixes, suggested by Couchbase manual:

Entity "A":
key: a#123

Entity "B":
key: b#123

But we realized that we're getting confused with choosing the strategy for creating compound document keys. We use counters a lot, and they require their own documents. Our keys have become complicated:

Daily counter "x" for entity "A":
key: cntrx#a#123-20140117

We've considered different approaches, but we're still greenhorns on the subject and would like to ask some advice.

Are hierarchical keys any good at all? Can anyone share their best practices for defining non-trivial keys?


Solution

  • In our project we used hierarchical keys in a way described below: First part of a key is something like table name from RDBMS: users - represents "table"

    Then each user has it's own id in example:

    users:1 - "represents one user"

    We used ':', because I think it just looks nicer than other delimiters. You can use any delimiter you like.

    If you want to use sequential indexes like id in previous example, you'll need get them from some key, so:

    users:counter - key that holds "last user id" (it acts like autoincrement)

    If you need to store some "subsection" for user account, you can store it:

    users:<user's id>:subsection.

    More complex example

    users:1:avatars:1:url - means that by this key we will get avatar url of user 1, but if user wants to store many avatars they will go under users:1:avatars:X:url, where X is will be a value of users:1:avatars:counter key.

    We used this strategy for all docs, that store only one value, JSON or even binary data.

    So exactly for your example, I'll chose:

    a:123-20140117:counter - that will mean that we have (speaking in RDBMS language) table named "a", in table "a" we have record with id (or something else) "123-20140117" that has field "cntrx".

    UPD: About key size. Actually it doesn't matter. Yes keys are limited in size, but there are a lot of ways to reduce it. One of them - use hashes, but I think it's bad way, because keys will be long and consume more memory. In our project we used "short" keys for memcached bucket. We had a enum (that can be also stored in couchbase) that represent human understandable key name and it's shorten value.

    Example: we have some set of records: list of users that have more than 30 photos. So we have a key-value pair:

    usersByPhotosCount - k:ubpc:{0}

    and for 30 photos key will be k:ubpc:30.

    But it's better to do such optimizations only on production. In development it's better to have understandable keys in app and database (i.e. you can create two sets of k-v pairs: normal for development, shorten and obfuscated for production and load them depending on your environment).