I have an artists table with over 100,000 records that I use to match against an array (between 1 and several thousands) of artists submitted by the user. My current query looks like this:
SELECT id from artists WHERE lower(name) IN(downcase_artists)
This does the job fine, but I'm wondering whether it could be made faster. Query time varies between a few hundred ms to sometimes 10 whole seconds when it's matching thousands of artists. The name column is indexed. (does that even make a difference on string columns?)
I was thinking maybe something like Redis would speed this up? By keeping a key-value store of the artist name and its corresponding id?
Is there any other option that I'm missing that would speed this up?
EDIT: as James suggested, I tried implementing some kind of all_artists cached method (using the memcache add-on on heroku) and use it to match my strings against it:
artist_ids = self.all_cached.select{|a| downcase_array.include?(a.name)}.collect(&:id)
I gained very small db query time but total request time increased drastically:
Before: Completed 200 OK in 1853ms (Views: 164.2ms | ActiveRecord: 1476.3ms)
After: Completed 200 OK in 12262ms (Views: 169.2ms | ActiveRecord: 1200.6ms)
I'm getting similar results when I run it locally:
Before: Completed 200 OK in 405ms (Views: 75.6ms | ActiveRecord: 135.4ms)
After: Completed 200 OK in 3205ms (Views: 245.1ms | ActiveRecord: 126.5ms)
Putting the ActiveRecord times aside, it looks like taking the string matching off the query worsened my issue (and that's with as few as 100 strings). Or am I missing something?
I also had a look at full-text search engines such as Sphinx but they definitely sound overkill, since I'm only searching through 1 single column...
I ended up using Redis to store not only artist ids and names but the whole json response I return to the user. My Redis hash looks like this:
{"all_artists" => ["artistname1" => "json_response1", "artistname2" => "json_response2"...]}
I do the matching using the following (redis-rb gem):
REDIS.hmget("all_artists", *downcase_array)
That returns all the json strings (including the artist id, name, and upcoming concerts) for the corresponding artists without ever hitting the db. I'm obviously updating the Redis hash every time artists or concerts are updated.
And the resulting time difference (for 100 artists):
Before: Completed 200 OK in 1853ms (Views: 164.2ms | ActiveRecord: 1476.3ms)
Now: Completed 200 OK in 226ms (Views: 127.2ms | ActiveRecord: 48.7ms)
There is still some optimization left to be done but the string matching is definitely out of the way now.