I have an issue with designing a database in mongo db.
So in general, the system will continuously gather insight user data (e.g. likes, retweets, views) from different social websites apis (twitter api , instagram api , fb api) with with different rate of each channel. While also saving each insight every hour as historical data . These current real time insights should be viewed by users in the website. Should I save the insight data in cache and the historical insight data in document ?
What is the expected write rate and query rate? What rate will the dataset grow at? These are key questions that will determine the size and topology of your MongoDB Cluster. If your write rate does not exceed the write capacity of a single node then you should be able to host your data on a single replica set. However, this assumes that your data set is not large (>1TB). At that size recovery from a single node failure can be time-consuming (it will not cause an outage but the longer a single node is down the higher the risk of a second node failing).
In both cases (write capacity exceeds a single node or dataset is larger than 1TB) the rough guidance is that this is the time to consider a [sharded cluster][2]
. Design of a sharded cluster is beyond the scope of a single StackOverflow answer.