I'm building a rails project, and I have a database with a set of tables.. each holding between 500k and 1M rows, and i am constantly creating new rows.
By the nature of the project, before each creation, I have to search through the table for duplicates (for one field), so i don't create the same row twice. Unfortunately, as my table is growing, this is taking longer and longer.
I was thinking that I could optimize the search by adding indexes to the specific String fields through which i am searching.. but I have heard that adding indexes increases the creation time.
So my question is as follows: What is the trade off with finding and creating rows which contain fields that are indexed? I know adding indexes to the fields will cause my program to be faster with the Model.find_by_name.. but how much slower will it make my row creation?
Indexing slows down insertation of entries because its required to add the entry to the index and that needs some ressources but once added they speed up your select queries, thats like you said BUT maybe the b-tree isnt the right choice for you! Because the B-Tree indexes the first X units of the indexed subject. Thats great when you have integers but text search is tricky. When you do queries like
Model.where("name LIKE ?", "#{params[:name]}%")
it will speed up selection but when you use queries like this:
Model.where("name LIKE ?", "%#{params[:name]}%")
it wont help you because you have to search the whole string which can be longer than some hundred chars and then its not an improvement to have the first 8 units of a 250 char long string indexed! So thats one thing. But theres another....
You should add a UNIQUE INDEX because the database is better in finding duplicates then ruby is! Its optimized for sorting and its definitifly the shorter and cleaner way to deal with this problem! Of cause you should also add a validation to the relevant model but thats not a reason to let things lide with the database.
// about index speed
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
You dont have a large set of options. I dont think the insert speed loss will be that great when you only need one index! But the select speed will increase propotionall!