I have a server storing content 5,000 documents. Lets say I have 1 million users who all query for 50 new documents at their own pace, until all content has been seen.
I want to make sure that each user only sees and interacts with the content once and never again, like Tinder.
My first thought was to tag each document with a list of user-ids of the users who have seen the document. However, this list would get really long... like a list of 1 million user-ids per document - but this sounds like it would really kill query performance.
Does anyone have any better ideas of how I can return content to users just once and never again.
p.s i am planning on doing this build out with mongoDB
p.p.s i thought about making a list of 'document-ids-seen' and attaching that to the user's document, and then with every query made by that user 'filter' out results that match 'document-ids-seen', but same challenge here, the query length would grow linearly as the user keeps interacting and bringing in new content.
The solution depends on the exact meaning of "at their own pace".
Your second post suggests that the time schedule is up to the user, but she will be presented with the documents in an order determined by your application, like e.g. getting news items in the order of the timestamp of news creation. In that case, your timestamp or auto increment solution will work, and it has only a small impact on data volume and query complexity.
If, however, the user may also choose which documents to view, this won't work any more, as the documents already viewed may be scattered across the entire document set. A solution to handle this efficiently consists of two design ideas:
(a) Imagine whether most users, at a given point of time, will have viewed a small or a large part of the entire document set. If only a small selection of documents is expected to be of interest to a particular user, then the count of documents the user has viewed will be rather small. (E.g. assume the documents are about IT and one user only wants to look at MongoDB docs, another mainly at Linux docs.) If all users will be interested in most or all of documents, then the count of documents a particular user has not viewed will be small. (E.g. a set of news that everyone tries to follow.) Depending on which is the case, store only a small list of viewed/not viewed document ids with each user, which will also simplify the query for the documents still to be viewed.
(b) With each user, don't store a list of single document ids (viewed or not viewed), but a list of intervals of such ids. E.g., if you store ids of documents not yet viewed, and some documents get added to the database, then, when a user is opened, her highest interval will be updated from (someLowerId, formerHighestId)
to (someLowerId, currentHighestId)
. When a user views a document, the interval containing its id gets split from (lowId, highId)
to (lowId, viewedId - 1), (viewedId + 1, highId)
, where one or both of these intervals may get empty. Including or excluding intervals like these will also simplify the queries as opposed to listing single ids.