I am evaluating a number of NoSQL implementations (RavenDB and MongoDB at the moment) as a means of solving a specific set of requirements that involve storage/retrieval of data that is schema-less. I want to get some feedback on whether NoSQL is the direction I should be looking in, or if there are other (potentially simpler) options.
Essentially we have a software product that (among other things) defines a basic domain model that consists of a few related entities, each of which have a number of attributes (key/value). As we release to the customer, we work with them to setup the attributes and values, which is essentially the configuration of the system. This is fairly straightforward, and because the design is known up front, we don't need anything dynamic to achieve this and make it perform (we will use an RDBMS). The attributes are not known up front, but again this is not a problem as this part of the system pretty much revolves around an attribute model.
The problem is that for different customers, and AFTER we release and are in production, we find that we need to query for specific sets of attribute data that we knew nothing about when we compiled and released the code (and before we configured the attributes for the customer). We basically need to produce data from the attribute maps that we can store (we won't know the structure up front) and then query that stored data later in ways we can't anticipate. The thinking right now is that we can create hooks that get hit during processing and allow us to plug-in libraries (likely via MEF) that create the data so it gets stored, and then query it later when needed (not for reporting--usually to create additional data/attributes).
(Note that creating the hooks and plug-in libraries is a separate problem, and is not intended to be part of this question.)
A common scenario might be: "I want to know how many times xxx occurred in the last 10 days". So I would create a plug-in that would recognize that xxx has occurred, and write it to a data store with a date/time. Then I would create another plug-in (probably in the same DLL) that would perform the query, and add an attribute to the model called "CountOfxxxInLast10Days". Another scenario might be to create configurable lookups. So I might have a plug-in that runs at startup to create/update a table of lookup data that could convert one attribute value to another, or (more likely) a range of values that would convert to a lookup values. So the conversion plugin might add a table with columns: bottom_value, top_value, multiplier, and the query plugin would query the table using an attribute value, like "SELECT multiplier FROM table WHERE [attribute_value] BETWEEN bottom_value AND top_value". The result might add the result to the an attribute called "Multiplier".
In certain cases, old data could be purged after a specified period of time. In the first scenario described above, it might be desirable to remove data from the store/cache that was older than ten days.
In other cases data would need to be persisted permanently, like in the second scenario above. It's possible this data could simply be re-created at startup, as opposed to held in a permanent store.
Additional requirements:
We are pretty committed to the .Net platform at this point, so any option would have to have a solid .Net client/API.
There are three possible options, each with pros and cons.
You're already storing the entities in a relational database. You can store the undefined attributes in an extra table, that has a Key
and Value
column, and an EntityId
column that references the entity to which the attributes belong. Basically, you'll be using part of your database as a key-value store.
Advantages:
Disadvantages:
Key-value stores, such as Redis and Riak, or the more advanced Apache Cassandra, are optimized for storing key-value pairs (no surprise there...). You can use a key-value store next to your RDBMS, dedicated to storing the attributes, while keeping the entities in your RDBMS.
Advantages:
Disadvantages:
You could use a document database to store just the attributes. But you can also take the plunge and store everything in a document database, including your entities.
Advantages:
Disadvantages:
Apache CouchDB has quite a list of applications using it and receives positive feedback from the Stack Overflow community. It has a few drivers for .NET, but I can't tell you how mature these drivers are.
MongoDB has quite an impressive list of production employments. There are three major drivers for .NET available, which all seem to be of good quality.
RavenDB has excellent support for .NET as it was designed for the .NET platform. However, I haven't been able to find examples of large production environments running on RavenDB. Still, I think it's definitely worth exploring.
I don't have much hands-on experience with any of them in production environments, so I don't know exactly how easy they are to backup/restore. But given the fact that these NoSQL systems aren't as rigid as RDBMS systems, I guess they should be easier to backup/restore without downtime than an RDBMS.