I have a problem and I think a NoSQL solution is the answer but I am not sure. Also, I am not sure what type of NoSQL DB (Object,Document,Graph,Key,etc) would be best suited to solve this problem.
Problem:
I have two collections. CollectionA contains 2K+ strings (domain names). CollectionB is much larger and looks (pseudo) like this:
{
"To" : "address1@address1.com,address2@address2.com,there_could_be_100@more_address.com",
"Bcc" : "address1@address1.com,address2@address2.com,there_could_be_100@more_address.com",
"From" : "address1@address1.com,address2@address2.com,there_could_be_100@more_address.com",
"Subject" : "Email Subject",
"Unknown" : "NumberOfFields",
"N" : "PlusOneExtraFields",
}
Knowns:
Current line of thinking:
Using a NoSQL solution and maybe the C# dynamic keyword?
Fuzzy
Is this a problem that is easitly solved by a document database?
Is searching/comparing across this type of data structure something that for Map/Reduce?
I totally agree with @HighTechRider, denormalization of data (exploding as you put it) seems a must in this instance for performant queries if the volume of data is as large as you imply, else it doesn't matter what product you pick, it'll end up being a free-text scan of some fashion or other.
@chx's suggestion of Sphinx, seems plausible in at least accelerating the latter. But there are hidden costs to that route - needing you to bundle, install, manage, patch, update etc. someone else's service alongside your software.
Minimizing desktop resource consumption in indexing and query have to be high priority, and setting up a free-text server on a desktop seems somewhat contra that charter.
I'd start with either basic file-system - using filesystem objects to represent your denormalized data. Or if representing and executing your queries seems too complex, look at simple embedded table libraries like SQLite or SQL Compact edition before trying shoehorn more exotic server-targetted products onto the desktop.
Nice comparison of SQLite vs. SQL Compact Edition here:
SQLite can also create free-text indexes that cover some of your "unknown field" scenarios in future.
As for map-reduce, it's strategy is valid for the domain you're approaching.