databaseormdocument-oriented-db

Are document-oriented databases any more suitable than relational ones for persisting objects?


In terms of database usage, the last decade was the age of the ORM with hundreds competing to persist our object graphs in plain old-fashioned RMDBS. Now we seem to be witnessing the coming of age of document-oriented databases. These databases are highly optimized for schema-free documents but are also very attractive for their ability to scale out and query a cluster in parallel.

Document-oriented databases also hold a couple of advantages over RDBMS's for persisting data models in object-oriented designs. As the tables are schema-free, one can store objects belonging to different classes in an inheritance hierarchy side-by-side. Also, as the domain model changes, so long as the code can cope with getting back objects from an old version of the domain classes, one can avoid having to migrate the whole database at every change.

On the other hand, the performance benefits of document-oriented databases mainly appear to come about when storing deeper documents. In object-oriented terms, classes which are composed of other classes, for example, a blog post and its comments. In most of the examples of this I can come up with though, such as the blog one, the gain in read access would appear to be offset by the penalty in having to write the whole blog post "document" every time a new comment is added.

It looks to me as though document-oriented databases can bring significant benefits to object-oriented systems if one takes extreme care to organize the objects in deep graphs optimized for the way the data will be read and written but this means knowing the use cases up front. In the real world, we often don't know until we actually have a live implementation we can profile.

So is the case of relational vs. document-oriented databases one of swings and roundabouts? I'm interested in people's opinions and advice, in particular if anyone has built any significant applications on a document-oriented database.


Solution

  • Well it depends how your data is structured and on the data-access-patterns.

    Document databases store and retrieve documents and basic atomic stored unit is a document. As you said, you need to think about your data-access patterns / use-cases to create a smart document-model. When your domain model can be split and partitioned across some documents, a document-database works like a charm. For example for a blog-software, a CMS or a wiki-software a document-db works extremely well. As long as you can find a good way to squeeze your data into a document you don't have any problems. But don't try to fit a relational-model into a document-database. As soon as you data-access patterns use a lot of 'navigation' on relations, graph or object-databases are a more natural choice.

    Another thing is about read/write-performance trade offs. For example a blog-software. In a transitional RDBMS data-model the data is normalized. This means, that reading the data is expensive, because read from different tables, calculate relations with joins etc to read a blog-post. In exchange, changing a tag is inexpensive. In contrast, in a document-database reading a blog-post is cheap, because you just load the post-document. However updating is probably more expensive, because you need to store the whole document. Or worse, go through a lot of documents to change something (rename a tag-scenario). In most systems, reading is way more important than writing. So it actually makes sense to use the renormalized data stores.

    I think that on large databases the schema-free design can have its advantages. In RDBMS you need to upgrade you schema which is a really painful process. Especially to convert the existing data to the new schema. In a schema-free database, you application needs to deal with that, which gives more flexibility. For example, you can upgrade the schema on the fly, when a old document is access. This way, you can keep your giant database up and running, while the application handles older versions on the fly.