meteormeteor-publicationsdatabasenosql

Modeling and publishing a follower-based feed with Meteor


I'm working on a simple app where a User can follow other users. Users can star posts. And a user's feed is composed of posts that have been starred by users they follow. Pretty simple actually. However, this all gets complicated in Mongo and Meteor...

There are basically two way of modeling this that I can think of:

  1. A user has a property, following, which is an array of userIds that the user follows. Also, a post has a property, starrers, which is an array of userIds that have starred this post. The good thing about this method is that publications are relatively simple:

    Meteor.publish 'feed', (limit) ->
      Posts.find({starrers: {$in: Meteor.users.findOne(@userId).following}}, {sort: {date: -1}, limit:limit})
    

    We aren't reactively listening to who the user is following, but thats not too bad for now. The main problem with this approach is that (1) the individual documents will become large and inefficient if 1000000 people star a post. Another problem is that (2) it would be pain to keep track of information like when a user started following another user or when a user starred a post.

  2. The other way of doing this is having two more collections, Stars and Follows. If a user stars a post, then we create a document with properties userId and postId. If a user follows another user, then we create a document with properties userId and followId. This gives us the advantage of smaller document sizes for Users and Posts, but complicated things when it comes to querying, especially because Mongo doesn't handle joins!

Now, I did some research and people seem to agree that the second choice is the right way to go. Now the problem I'm having is efficiently querying and publishing. Based on the Discover Meteor chapter about Advanced Publications, I created a publication that publishes the posts that are starred by user's followers -- sorted, and limited.

# a helper to handle stopping observeChanges
observer = (sub, func) ->
  handle = null
  sub.onStop -> 
    handle?.stop?()
  () ->
    handle?.stop?()
    handle = func()


Meteor.publish 'feed', (limit) ->
  sub = this
  userId = @userId

  followIds = null
  eventIds = null

  publishFollows = observer sub, () ->
    followIds = {}
    Follows.find({userId:userId}).observeChanges 
      added: (id, doc) ->
        followIds[id] = doc.followId
        sub.added('follows', id, doc)
        publishStars()   
      removed: (id) ->
        delete followIds[id]
        sub.removed('follows', id)
        publishStars()

  publishStars = observer sub, () ->
    eventIds = {}
    Stars.find({userId: {$in: _.keys(followIds)}).observeChanges 
      added: (id, doc) ->
        eventIds[id] = null
        sub.added('stars', id, doc)
        publishEvents()
      removed: (id) ->
        delete eventIds[id]
        sub.removed('stars', id)
        publishEvents()

  publishEvents = observer sub, () ->
    Events.find({_id: {$in: _.keys(eventIds)}}, {sort: {name:1, date:-1}, limit:limit}).observeChanges 
      added: (id, doc) ->
        sub.added('events', id, doc)
      changed: (id, fields) ->
        sub.changed('events', id, fields)
      removed: (id) ->
        sub.removed('events', id)

While this works, it seems very limited at scale. Particularly, we have to compile a list of every starred post by every follower. The size of this list will grow very quickly. Then we do a huge $in query against all posts.

Another annoyance is querying for the feed on the client after we subscribe:

Meteor.subscribe("feed", 20)
posts = null
Tracker.autorun ->
  followers = _.pluck(Follows.find({userId: Meteor.userId()}).fetch(), "followId")
  starredPostIds = _.pluck(Stars.find({userId: {$in: followers}}).fetch(), "postId")
  posts = Posts.find({_id: {$in: starredPostIds}}, {sort: {date: -1}, limit: 20}).fetch()

Its like we're doing all this work twice. First we do all the work on the server to publish the feed. Then we need to go through the exact same logic again on the client to get those posts...

My question here is a matter of design over everything. How can I efficiently design this feed based on followers staring posts? What collection / collection schemas should I use? How should I create the appropriate publication? How can I query for the feed on the client?


Solution

  • So it turns out that Mongo and "non-relational" databases simply aren't designed for relational data. Thus, there is no solution here with Mongo. I've ended up using Neo4j, but SQL would work fine as well.