sqlmongodbdatabase-designdb4onosql

Is NoSQL the best option for this specific database problem


I have a problem and I think a NoSQL solution is the answer but I am not sure. Also, I am not sure what type of NoSQL DB (Object,Document,Graph,Key,etc) would be best suited to solve this problem.

Problem:

I have two collections. CollectionA contains 2K+ strings (domain names). CollectionB is much larger and looks (pseudo) like this:

{
    "To" : "address1@address1.com,address2@address2.com,there_could_be_100@more_address.com",  
    "Bcc" : "address1@address1.com,address2@address2.com,there_could_be_100@more_address.com",  
 "From" : "address1@address1.com,address2@address2.com,there_could_be_100@more_address.com", 
 "Subject" : "Email Subject", 
 "Unknown" : "NumberOfFields", 
 "N" : "PlusOneExtraFields", 
}

Knowns:

  1. There can be 100s of people listed in the To, Bcc, and From strings.
  2. I don't have a good way to explode the To, From, Bcc fields.
  3. Without a way to explode the To, From, Bcc fields I am forced to search strings.
  4. There are a few known fields but many unknown fields.
  5. The requirements don't currently call for searching across the unknown fields.
  6. The database engine needs to run on a windows desktop.

Current line of thinking:

Using a NoSQL solution and maybe the C# dynamic keyword?

Fuzzy

  1. Is this a problem that is easitly solved by a document database?

  2. Is searching/comparing across this type of data structure something that for Map/Reduce?


Solution

  • I totally agree with @HighTechRider, denormalization of data (exploding as you put it) seems a must in this instance for performant queries if the volume of data is as large as you imply, else it doesn't matter what product you pick, it'll end up being a free-text scan of some fashion or other.

    @chx's suggestion of Sphinx, seems plausible in at least accelerating the latter. But there are hidden costs to that route - needing you to bundle, install, manage, patch, update etc. someone else's service alongside your software.

    Minimizing desktop resource consumption in indexing and query have to be high priority, and setting up a free-text server on a desktop seems somewhat contra that charter.

    I'd start with either basic file-system - using filesystem objects to represent your denormalized data. Or if representing and executing your queries seems too complex, look at simple embedded table libraries like SQLite or SQL Compact edition before trying shoehorn more exotic server-targetted products onto the desktop.

    Nice comparison of SQLite vs. SQL Compact Edition here:

    http://www.tech-archive.net/Archive/DotNet/microsoft.public.dotnet.framework.compactframework/2005-12/msg00019.html

    SQLite can also create free-text indexes that cover some of your "unknown field" scenarios in future.

    As for map-reduce, it's strategy is valid for the domain you're approaching.