I am expected to implement a storage and search solution for large dataset which has more than 4 million of documents. Each document will have 40 or more fields (or search criteria)
I have worked with Lucene and Solr before, so I tend to use them for this problem (any other ideas and solutions are welcomed of course). But the thing bugs me is the efficient and scalable storage. I have been looking around for Cassandra and MongoDB and some other NoSQL solutions but couldn't be sure which technology could be the best for the requirement.
I would like to ask if anyone has ever faced a similar issue and what she/he used to solve it..
Check this survey paper for general reference:
Survey of Document Oriented Datastores, some metrics available
http://cattell.net/datastores/Datastores.pdf
For IEEE subscribers:
NoSQL evaluation: A use case oriented survey
http://www.computer.org/portal/web/csdl/doi/10.1109/CSC.2011.6138544
Link