mysqlfull-text-searchregexp-like

Mysql query to search posts with title, content and tag


Requirement

I want to make it in one query. How to make an efficient query?

The followings are querys that i made with REGEXP and FULLTEXT.

REGEXP

(SELECT * FROM board
    WHERE
        title rlike 'first' AND title rlike 'second'
    ORDER BY board_id DESC LIMIT 1000)
UNION
(SELECT * FROM board
    WHERE
        content rlike 'first' AND content rlike 'second'
    ORDER BY board_id DESC LIMIT 1000)
UNION
(SELECT * FROM board
    WHERE
        tag rlike 'first' AND tag rlike 'second'
    ORDER BY board_id DESC LIMIT 1000)
LIMIT 1000;

FULLTEXT

(SELECT * FROM board
    WHERE
        match(title) AGAINST('+"first" +"second"' in boolean mode)
    ORDER BY board_id DESC LIMIT 1000)
UNION
(SELECT * FROM board
    WHERE
        match(content) AGAINST('+"first" +"second"' in boolean mode)
    ORDER BY board_id DESC LIMIT 1000)
UNION
(SELECT * FROM board
    WHERE
        match(tag) AGAINST('+"first" +"second"' in boolean mode)
    ORDER BY board_id DESC LIMIT 1000)
LIMIT 1000;

As i now, REGEXP doesn't use index but it is faster than fulltext. I don't understand why it happens too.


CREATE STATEMENT

CREATE TABLE `board` (
  `board_id` bigint NOT NULL AUTO_INCREMENT,
  `user_id` bigint NOT NULL,
  `nickname` varchar(255) NOT NULL,
  `category` int NOT NULL,
  `title` varchar(255) NOT NULL,
  `content` text NOT NULL,
  `likes` int NOT NULL DEFAULT '0',
  `hits` int NOT NULL DEFAULT '0',
  `tag` varchar(255) DEFAULT NULL,
  `create_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `modify_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`board_id`),
  KEY `popular` (`create_date`,`likes`),
  FULLTEXT KEY `fttitle` (`title`),
  FULLTEXT KEY `ftcontent` (`content`),
  FULLTEXT KEY `fttag` (`tag`)
) ENGINE=InnoDB AUTO_INCREMENT=60027 DEFAULT CHARSET=utf8;


Solution

  • The regexp and Union won't be in any particular order.

    In most formulations, it would be better to fetch the ids in the unions first, then JOIN to get the rest of the columns from the few rows that result.

    Be aware that UNION means UNION DISTINCT and has a dedupping pass. (This is probably what you want, even though it is slower than UNION ALL.)

    Most of the ORDER BY you have shown are useless -- they will do nothing, except (maybe) waste time. ORDER BY and LIMIT need to be paired up for each subquery. And then again after doing the UNION. A related topic: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

    nnichols's answer fails to dedup. That can be remedied by an extra (outer) Select that Groups the answers and picks MIN(col_sort) with GROUP BY id before doing the order by.

    More

    SELECT c.*
        FROM (
            SELECT id, MIN(col_sort) AS col_sort2
                FROM (
                    SELECT id, 1 AS col_sort
                        FROM ... 
                        WHERE MATCH ...
                        LIMIT 100
                    UNION ALL
                    SELECT id, 2 AS col_sort
                        FROM ... 
                        WHERE MATCH ...
                        LIMIT 100
                    UNION ALL
                    SELECT id, 3 AS col_sort
                        FROM ... 
                        WHERE MATCH ...
                        LIMIT 100
                     ) AS a
                GROUP BY id 
             ) AS b
        JOIN board AS c  USING(id)
        ORDER BY b.col_sort2
        LIMIT 100;
    

    Don't use a huge number like 1000 for the LIMIT. It does get rather messy to maintain the 'relevance' through the queries.