mysqlselectconcatenationlocate

Search for multiple different strings in the same set of multiple columns


I need to search for multiple different strings in the same set of multiple columns in a MySQL table.

Here's one way that works (CONCAT / CONCAT_WS):

SELECT * from some_table WHERE 
LOCATE('word1', (CONCAT(column_1, column_2, column_3))) > 0 OR 
LOCATE('word2', (CONCAT(column_1, column_2, column_3))) > 0 OR
LOCATE('word3', (CONCAT(column_1, column_2, column_3))) > 0;

Is this an efficient approach? Is there a way to CONCAT the columns once and apply multiple conditions - something like:

SELECT *, CONCAT(column_1, column_2, column_3) AS combined_columns WHERE
LOCATE ("word1", combined_columns) > 0 OR
LOCATE ("word2", combined_columns) > 0 OR
LOCATE ("word3", combined_columns) > 0;

Solution

  • This approach:

    SELECT * from some_table WHERE 
    LOCATE('word1', (CONCAT(column_1, column_2, column_3))) > 0 OR 
    LOCATE('word2', (CONCAT(column_1, column_2, column_3))) > 0 OR
    LOCATE('word3', (CONCAT(column_1, column_2, column_3))) > 0;
    

    is not very efficient, because you compute CONCAT multiple times. A way to resolve this is to implement a function, like

    DELIMITER //
    
    CREATE FUNCTION is_like(concatenated text, word1 text, word2 text, word3 text) RETURNS INT DETERMINISTIC
    BEGIN
        RETURN
            (LOCATE(word1, concatenated) > 0) OR
            (LOCATE(word2, concatenated) > 0) OR
            (LOCATE(word3, concatenated) > 0)
        ;
    END
    
    //
    
    DELIMITER ;
    

    and you can call it like

    SELECT is_like(concat('abc', 'word', '123'), 'word1', 'word2', 'word3');
    

    and of course you can apply this to your columns too. You can also do a regexp search:

    SELECT CONCAT(column1, column2, column3) REGEXP '.*(word1|word2|word3).*';
    

    The .* part means anything of any length, so the thing between the left and right .* means a match anywhere, finally the (word1|word2|word3) part means word1 or word2 or word3. So it boils down to "let's see whether word1 or word2 or word3 is anywhere in the string".

    I did not compare the performance of the two approaches, so it is up to you do benchmark this on large tests.

    But something to be pointed out: if column1 and column2 and column3 are unrelated, beware of false matches. Example:

    This would be a false match if the columns are unrelated, so you might have to avoid the concatenation. But whether this is or this isn't your case, it's up to your exact requirements.