mysqlquery-optimizationsql-likesql-execution-planexplain

MySQL(5.7.26) query optimization, why select * is much faster that select id in my leading wildcard query?


I have a leading wildcard query I know it is hard to optimize. I know if I use only the trailing wildcard I can do some optimization. But our client wants the leading wildcard because when searching mobile number they always use the last N digits.

The start digits is always 1[3-9]xx, as the mobile format China is always 1[3-9]x-xxxx-xxxx, so using like 136% will get too many results and is not that useful.

select id from customer where seid = 134 and telephone like '%18749618910%'

CREATE TABLE `customer` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `seid` bigint(20) unsigned NOT NULL DEFAULT '0' ,
  `ccgeid` bigint(20) unsigned NOT NULL DEFAULT '0',
  `cid` bigint(20) unsigned NOT NULL DEFAULT '0' /* foreign key */
  `telephone` varchar(63) COLLATE utf8mb4_unicode_ci NOT NULL
  `create_time` bigint(20) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`,`seid`),
  KEY `s_telephone` (`seid`,`telephone`)
  UNIQUE KEY `seid_ccgeid_phone` (`seid`,`ccgeid`,`telephone`),
  KEY `s_cid` (`seid`,`cid`),
)

Resorting to FTS is the long term goal. But since I only need id here I did some experiment to see if I improve some query performance. But I was surprised to see select * ... is faster than select id .... I have around 2.5 millions records, select * alway takes about 1 second while select id takes about 4 to 5 seconds.

The explain Extra information shows select id Using where while select * Using index condition, both with same possible_keys.

But as Bill Karwin explained in What does MySQl explain Extra "Using where" really mean?, Using where does not help, while I can't find a clear explanation for Using index condition (mysql document does not help).

So why select * is much faster than select id in my case ? Since I need id in the end how do I utilize that to optimize my query ?

Based on the comments I got from @Amadan ("all you need to trigger the ICP optimisation is to return any one column that is not found in the index"), I did some further test to confirm his hypothesis (although I have not fully got it) that select id, create_time is as fast as select * because they both use Using index condition while select id, ccgeid is as slow as select id as they both use Using where.

PS. the explain details:

explain select id from customer where seid = 134 and telephone like '%18749618910%'

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: p6
         type: ref
possible_keys: seid_ccgeid_phone,s_cid,s_telephone
          key: s_cid
      key_len: 8
          ref: const
         rows: 1278146
     filtered: 11.11
        Extra: Using where


explain select * from customer where seid = 134 and telephone like '%18749618910%'
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: p6
         type: ref
possible_keys: seid_ccgeid_phone,s_cid,s_telephone
          key: seid_ccgeid_phone
      key_len: 8
          ref: const
         rows: 1278146
     filtered: 11.11
        Extra: Using index condition

--- update ---

According to @RickJames suggestion I added FORCE INDEX(s_telephone) to my query and following are the result:

  1. For select *, with or without FORCE INDEX(s_telephone) there is no obvious performance difference and they both use Using index condition

  2. For select id with FORCE INDEX(s_telephone) there is obvious performance improvement from 4 seconds to around 1.5 seconds, but still slower than select *. It use Using where; Using index now.


Solution

  • I'll try to collate everything I wrote in comments.