ruby-on-railscouchdbhadoopmerbhbase

Ruby On Rails/Merb as a frontend for a billions of records app


I am looking for a backend solution for an application written in Ruby on Rails or Merb to handle data with several billions of records. I have a feeling that I'm supposed to go with a distributed model and at the moment I looked at

HBase with Hadoop

Couchdb

Problems with HBase solution as I see it -- ruby support is not very strong, and Couchdb did not reach 1.0 version yet.

Do you have suggestion what would you use for such a big amount of data?

Data will require rather fast imports sometimes of 30-40Mb at once, but imports will come in chunks. So ~95% of the time data will be read only.


Solution

  • Depending on your actual data usage, MySQL or Postgres should be able to handle a couple of billion records on the right hardware. If you have a particular high volume of requests, both of these databases can be replicated across multiple servers (and read replication is quite easy to setup (compared to multiple master/write replication).

    The big advantage of using a RDBMS with Rails or Merb is you gain access to all of the excellent tool support for accessing these types of databases.

    My advice is to actually profile your data in a couple of these systems and take it from there.