javasearchnosqlstoragelarge-data-volumes

Storing and searching 4+ million documents


I am expected to implement a storage and search solution for large dataset which has more than 4 million of documents. Each document will have 40 or more fields (or search criteria)

I have worked with Lucene and Solr before, so I tend to use them for this problem (any other ideas and solutions are welcomed of course). But the thing bugs me is the efficient and scalable storage. I have been looking around for Cassandra and MongoDB and some other NoSQL solutions but couldn't be sure which technology could be the best for the requirement.

I would like to ask if anyone has ever faced a similar issue and what she/he used to solve it..


Solution

  • Check this survey paper for general reference:

    Survey of Document Oriented Datastores, some metrics available
    http://cattell.net/datastores/Datastores.pdf

    For IEEE subscribers:

    NoSQL evaluation: A use case oriented survey
    http://www.computer.org/portal/web/csdl/doi/10.1109/CSC.2011.6138544
    Link