mongodbdatabase-designdatabase-schemaschema-designdatabase

Opinion on my case MongoDB schema design


This is a pretty common question on MongoDB: When to embed and when to reference.

However in my case, this appears to be somehow a dilemma. I have a document that have a reference where I could just embedded it, yet it will cost me the size of the disk. But if I make a reference, it will give me quite a performance cost.

Here's an example, say I have this Member with the 'detail' as my problem:

Member: {
    _id: "abc",
    detail: {
        name: "Stack Overflow",
        website: "www.stackoverflow.com"
    }
}

I want this Member's detail to be in every Blog this member "asdf" made cause every blog displayed would display the member details. So there are 2 options I can do for my Blog document:

First, make a reference by only putting the member's _id:

Blog: {
    _id: 123,
    memberId: "asdf"  ---> will be used as reference to query specific member
}

or Second, embed the member into Blog instead:

Blog: {
    _id: 123,
    member: {
        _id: "asdf",
        detail: {
            name: "Stack Overflow",
            website: "www.stackoverflow.com"
        }
    }
}

So the first option requires another query for member which is a performance issue. The second option however is faster cause I only need to query once, yet my disk would get larger for redundant data of embedded document 'member' as the number of Blog grows.

PS: As you can see for this example, Member and Blog relationship is one-to-many, so a member can have many blogs, but member's detail variables stay the same; 'name' and 'website'.

Any opinion which is better in this case? It'll be great if you also have the 3rd solution. Thanks before.


Solution

  • I think it is fine to keep the member details separate, like a forum signature. That way when a member updates their details all the posts will show their current information without your application having to update duplicate data in every previous post.

    From your description it sounds like you may only be displaying this on blog posts the users create, rather than on every comment they make on a page.

    If you are worried about the performance cost of an extra query per user, you could always cache that user data (or the generated page output) instead of relying on fetching all the blog info in a single DB query. I would see how the application performs in actual usage before trying to optimize for a use case that may not be a problem.

    Another approach would be to only show the extra user details as an Ajax hover (similar to how SO shows more information for an established user.