databaseindexing

Number of reads compared to writes for database indexing


It is known that database indexing only makes sense if you have large tables and more reads than writes as the creation of the indices leads to additional writing overhead as each modification in the database also leads to a modification of the indices.

If we assume that the indexing structure of the database is a) a B+ tree b) a hashtable what is a rule of thumb for the number of reads compared to the number of writes where it starts to make sense to implement database indexing ?

For information on how database indexing works, check out How does database indexing work?


Solution

  • There are many factors involved here. For example, is the index data in the cache? Are the data blocks in the cache? How many rows are retrieved by the query? How many rows in the table etc etc I have been asked this question (or a variant of it) many times. Rather than give number, I prefer to give some information so that the person asking the question can make an informed decision. Here is a "back of the envelope" calculation example with some stated assumptions.

    1. Lets say the table has 100M rows
    2. Lets say a row is 200 bytes on average
    1. Lets assume that the index is fully cached and takes zero time
    2. Lets assume that the data has to be read from disk, at 5ms per read
    3. Lets assume that your IO subsystem can deliver data at 10 GB/s
    4. Lets assume that you need to read 50,000 rows from your table. Lets also assume that the rows are spread out such that two of the required rows live in te same block.

    OK, now we can do some math.

    Scenario 1. Using an index

    Scenario 2. Scanning the table

    Therefore in this case, scanning the data is much faster than using an index.