mariadb-10.4

return all regex match from each row of a table in mysql


I have a table named 'Articles' that has some rows, inside each row has some image tag that I want to return all of those.

I used this query but it only returns one of those tags per row. And other lines are returned empty.

SELECT REGEXP_SUBSTR(body, '([0-9])+\.(jpg|png|gpeg)') from articles;

enter image description here

Article 1:
    <img src"54545343.png" />
    <img src"24352445.png" />
    <img src"24352435.png" />

article 2: 
<img src"24352435.png" />

article 3:
...

I want all of these images.

thank you for your help

UPDATE version 10.4.19-MariaDB


Solution

  • You could use a "dirty solution" like this, but it is horribly inefficient:

    SELECT a.id, REGEXP_SUBSTR(body, '([0-9])+\.(jpg|png|gpeg)', 1, t.n) AS img
    FROM articles a
    CROSS JOIN (
        VALUES
            ROW(1), ROW(2), ROW(3), ROW(4), ROW(5),
            ROW(6), ROW(7), ROW(8), ROW(9), ROW(10)
    ) AS t(n)
    HAVING img IS NOT NULL
    ORDER BY id, img;
    

    The fourth parameter to REGEXP_SUBSTR is:

    occurrence: Which occurrence of a match to search for. If omitted, the default is 1.

    The above query tries to retrieve occurrences 1 - 10.


    This is probably a good example of

    "just because you can, does not mean you should"

    but you could use a stored function to extract the content of the image src attributes:

    DELIMITER //
    CREATE FUNCTION FIND_IMG_SRC(str text, occurrence int)
    RETURNS VARCHAR(255)
    DETERMINISTIC
    BEGIN
        DECLARE loc INT DEFAULT 1;
        DECLARE src_loc INT;
        DECLARE i INT DEFAULT 0;
        DECLARE img_tag text DEFAULT '';
        WHILE(i < occurrence AND loc IS NOT NULL) DO
            SET loc = NULLIF(LOCATE('<img', str, loc + 1), 0);
            SET i = i + 1;
        END WHILE;
        
        IF loc IS NULL THEN
            RETURN NULL;
        ELSE
            # Get complete img tag
            SET img_tag = SUBSTR(str, loc, LOCATE('>', str, loc + 1) - loc + 1);
    
            # Check that img tag contains src attribute
            SET src_loc = LOCATE('src="', img_tag);
            IF src_loc = 0 THEN
                RETURN NULL;
            ELSE
                # Return content of src attribute
                RETURN SUBSTRING_INDEX(SUBSTR(img_tag, src_loc + 5), '"', 1);
            END IF;
        END IF;
    END//
    DELIMITER ;
    

    And then the following query:

    SELECT a.id, FIND_IMG_SRC(body, t.n) AS img
    from articles a
    CROSS JOIN (
        SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
        SELECT 6      UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10
    ) AS t
    HAVING img IS NOT NULL
    ORDER BY id, img;
    

    And here's a db<>fiddle.

    Note: the above stored function expects src="..." and will not work with single quotes or spaces either side of the =. It will also fail if there is a > anywhere inside the <img>.