mysqlperformanceoptimization

Why my query is slow even with less data?


I have a Flow Table in MYSQL database. It has the following column taskId, wName, key, status, envNo, updateTime etc. There is no join with other tables. It has the following indexes in place index1(taskId), index2(wName,key,name,evnNo), index3(name), index4(key), index5(updateTime), index6(envNo)

Query1:

Below query giving me 11 records

select * from Flow where status = 0 and wName = 'abc' order by updateTime LIMIT 50 

Running Explain on above query giving below response:

type: index, possible_key: index2, actual_key: index5, rows : 3000, Filtered: 0.05,   Extra : Using Where

Query2:

Below query giving me 50 records

select * from Flow where status = 0 and wName = 'xyz' order by updateTime LIMIT 50

Running Explain on above query giving below response:

type: index, possible_key: index2, actual_key: index5, rows : 108, Filtered: 1.84, Extra : Using Where

Below is the Total avalable records in Flow table for the specified wName.

select count(*) from Flow where wName = 'abc'  - it gives me 3000 records.

select count(*) from Flow where wName = 'xyz'  - it gives me 120000 records.

The problem here is Query1 is responding in 20 seconds and Query2 is responding in less than 1 second. Why it is happening? Query 1 has very less data as compared to Query2.

When i am removing LIMIT 50, both the queries are respnding in milliseconds.

When i am removing order by updateTime, both the queries are respnding in milliseconds.

Please note that i can't update indexes as everything is running fine except this case.

So far i got the below root cause.

  1. Rows might be distributed poorly across storage, leading to fragmentation.
  2. MySQL might not be able to efficiently use the indexes due to low selectivity.

If above is the reason, how do i prove these cause? If this is not the reason, what could be the possible reason for this?


Solution

  • According to your comment, you are mostly interested in knowing why Query1 is slower than Query2 even if it returns less rows, so I start with that.

    Using the index on updateTime (i.e. ordered by that column), your queries go through the table and look for rows that fit your condition.

    Query2 stops after having found 50 rows.

    Query1 cannot stop early, as there are only 13 rows in your table that fit, so it has to read the whole table to the end. Which is obviously slower than stopping earlier.

    To test this, you can increase the limit to a value that is larger than the number of rows you can get for Query2 (so it cannot stop early either), and you should get a similar execution time for both - unless MySQL decides to take a different index then.

    One optimal index for your query would be (wName, status, updateTime).

    But since you cannot (or don't want to) add it, choosing from the list of available indexes (and doing some assumptions on your data distribution), my guess is that the index on wName, ... would be best. Since most rows seem to have a different status than 0, ordering by updateTime is probably not as relevant as MySQL thought.

    This is probably also the index that MySQL picked when you removed LIMIT 50 (although you didn't mention what your primary key is, which might also be an option).

    You can force MySQL to use that index, e.g. try

    select * from Flow force index (index2) 
    where status = 0 and wName = 'abc' order by updateTime LIMIT 50
    

    Query2 will now be slower than Query1, but probably not slower than it is now.

    Note that this is specific to the values you picked and your data distribution, e.g. if you do the same query with where status = 1 (or whatever is most common), the current index could be faster (although my guess is that it wouldn't be). Keep that in mind (and test) if you use this query for different situations, for example if the user can pick a value for "status" and you run this query with the user's choice.