mongodbcassandrahbasenon-relational-database

Non-Relational Database Design For Big Data Warehouse


Suppose I need to design a table for Spotify where I need to quickly retrieve what items (song or album) a user already purchased so it can play for the user. The scenario is straightforward: when users click to buy the song, the database needs to quickly update a particular song being purchased to the user account.

Since it really requires near real-time response and the table could be increased exponentially, on the other hand, the access format is quite simple and fix, a non-relational database is designed for this use case. That's why I am thinking about using HBase, Cassandra, or MongoDB.

I would like to use UserId as the primary key for this Purchase Table, will Wide Column Stores like (HBase or Cassandra) or Document databases like MongoDB work better for this scenario?

The input is just a user_id and the database table response with all available purchased items. What is the best database table design strategy?

{user_id:int
   {purchased_item: item1
                    item2
                    item3
   }
}

The second table will be used for searching for specific artists, albums, genres, and songs that are available for purchase.

Appreciate if you can share any examples of best practice from the real-world application. Or any good article/document/blogs I can read.


Solution

  • If you are considering near real-time I would definitely consider using Cassandra especially for history detailed storage!

    What I would do using Cassandra is the folowing:

    CREATE TABLE purchases( user_id uuid, purchase_id uuid, item_id uuid, item_details text, item_name text, time_of purchase timestamp, PRIMARY KEY((user_id), purchase_id, item_id));

    This will let you cluster the data in several ways first using the user_id then using the purchase_id to keep all items recorded per purchase!

    By having the the Primary key formed of Partition key the user_id the clustering key the purchase_id and item_id we are able to group the items in the purchase_id and then in the user_id.

    https://cassandra.apache.org/doc/latest/data_modeling/intro.html https://docs.datastax.com/en/landing_page/doc/landing_page/current.html