I'm working on a simple app where a User can follow other users. Users can star posts. And a user's feed is composed of posts that have been starred by users they follow. Pretty simple actually. However, this all gets complicated in Mongo and Meteor...
There are basically two way of modeling this that I can think of:
A user has a property, following
, which is an array of userIds that the user follows. Also, a post has a property, starrers
, which is an array of userIds that have starred this post. The good thing about this method is that publications are relatively simple:
Meteor.publish 'feed', (limit) ->
Posts.find({starrers: {$in: Meteor.users.findOne(@userId).following}}, {sort: {date: -1}, limit:limit})
We aren't reactively listening to who the user is following, but thats not too bad for now. The main problem with this approach is that (1) the individual documents will become large and inefficient if 1000000 people star a post. Another problem is that (2) it would be pain to keep track of information like when a user started following another user or when a user starred a post.
The other way of doing this is having two more collections, Stars
and Follows
. If a user stars a post, then we create a document with properties userId
and postId
. If a user follows another user, then we create a document with properties userId
and followId
. This gives us the advantage of smaller document sizes for Users
and Posts
, but complicated things when it comes to querying, especially because Mongo doesn't handle joins!
Now, I did some research and people seem to agree that the second choice is the right way to go. Now the problem I'm having is efficiently querying and publishing. Based on the Discover Meteor chapter about Advanced Publications, I created a publication that publishes the posts that are starred by user's followers -- sorted, and limited.
# a helper to handle stopping observeChanges
observer = (sub, func) ->
handle = null
sub.onStop ->
handle?.stop?()
() ->
handle?.stop?()
handle = func()
Meteor.publish 'feed', (limit) ->
sub = this
userId = @userId
followIds = null
eventIds = null
publishFollows = observer sub, () ->
followIds = {}
Follows.find({userId:userId}).observeChanges
added: (id, doc) ->
followIds[id] = doc.followId
sub.added('follows', id, doc)
publishStars()
removed: (id) ->
delete followIds[id]
sub.removed('follows', id)
publishStars()
publishStars = observer sub, () ->
eventIds = {}
Stars.find({userId: {$in: _.keys(followIds)}).observeChanges
added: (id, doc) ->
eventIds[id] = null
sub.added('stars', id, doc)
publishEvents()
removed: (id) ->
delete eventIds[id]
sub.removed('stars', id)
publishEvents()
publishEvents = observer sub, () ->
Events.find({_id: {$in: _.keys(eventIds)}}, {sort: {name:1, date:-1}, limit:limit}).observeChanges
added: (id, doc) ->
sub.added('events', id, doc)
changed: (id, fields) ->
sub.changed('events', id, fields)
removed: (id) ->
sub.removed('events', id)
While this works, it seems very limited at scale. Particularly, we have to compile a list of every starred post by every follower. The size of this list will grow very quickly. Then we do a huge $in
query against all posts.
Another annoyance is querying for the feed on the client after we subscribe:
Meteor.subscribe("feed", 20)
posts = null
Tracker.autorun ->
followers = _.pluck(Follows.find({userId: Meteor.userId()}).fetch(), "followId")
starredPostIds = _.pluck(Stars.find({userId: {$in: followers}}).fetch(), "postId")
posts = Posts.find({_id: {$in: starredPostIds}}, {sort: {date: -1}, limit: 20}).fetch()
Its like we're doing all this work twice. First we do all the work on the server to publish the feed. Then we need to go through the exact same logic again on the client to get those posts...
My question here is a matter of design over everything. How can I efficiently design this feed based on followers staring posts? What collection / collection schemas should I use? How should I create the appropriate publication? How can I query for the feed on the client?
So it turns out that Mongo and "non-relational" databases simply aren't designed for relational data. Thus, there is no solution here with Mongo. I've ended up using Neo4j, but SQL would work fine as well.