mysqlruby-on-railsdatabasesolrdocument-oriented-db

Document-oriented dbms as primary db and a RDBMS db as secondary db?


I'm having some performance issues with MySQL database due to it's normalization.

Most of my applications that uses a database needs to do some heavy nested queries, which in my case takes a lot of time. Queries can take up 2 seconds to run, with indexes. Without indexes about 45 seconds.

A solution I came a cross a few month back was to use a faster more linear document based database, in my case Solr, as a primary database. As soon as something was changed in the MySQL database, Solr was notified.

This worked really great. All queries using the Solr database only took about 3ms.

The numbers looks good, but I'm having some problems.

The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data. Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.

The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.

Like this.

# Controller
@song = Song.first

# View
@song.artist.urls.first.service.name

The problem in my case is that the data being returned from Solr is flat like this.

{
  id: 123,
  song: "Waterloo",
  artist: "ABBA",
  service_name: "Groveshark",
  urls: ["url1", "url2", "url3"]
}

This forces me to build an active record object that can be passed to the view.

My question

Is there a better way to solve the problem? Some kind of super duper fast primary read only database that can handle complex queries fast would be nice.


Solution

  • Solr individual fields update

    About reindexing all on schema change: Solr does not support updating individual fields yet, but there is a JIRA issue about this that's still unresolved. However, how many times do you change schema?

    MongoDB

    If you can live without a RDBMS (without joins, schema, transactions, foreign key constrains), a document-based DB like MongoDB, or CouchDB would be a perfect fit. (here is a good comparison between them )

    Why use MongoBD:

    Why use SOLR:

    Why use MySQL

    Solutions

    So, the solutions (combinations) would be:

    1. Use MongoDB + Solr

      • but you would still need to reindex all on schema change
    2. Use only MongoDB

      • but drop support for advanced full-text search
    3. Use MySQL in a master-slave configuration, and balance reads from slave(s) (using a plugin like octupus) + Solr

      • setup complexity
    4. Keep current setup, denormalize data in MySQL

      • messy

    Solr reindexing slowness

    The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data. Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.

    Reindexing 200MB DB in Solr SHOULD NOT take 12 hours! Most probably you have also other issues like:

    MySQL:

    SOLR:

    From http://outoftime.github.com/pivotal-sunspot-presentation.html:

    • By default, Sunspot::Rails commits at the end of every request that updates the Solr index. Turn that off.
      • Use Solr's autoCommit functionality. That's configured in solr/conf/solrconfig.xml
      • Be glad for assumed inconsistency. Don't use search where results need to be up-to-the-second.

    Look at the logs for more details