phpregexsrctext-extractiontext-parsing

Extract <img> src value from a string which may contain invalid HTML


I have a variable like below in PHP.

$content = 'abc def <img src="https://www.example.com/images/abc.png" /> end';

I have to use regex to remove everything except the img tag's src value. So the final value:

$content = 'https://www.example.com/images/abc.png';

I have regex in Java to do it but I have to do it in PHP and I am not able to do it.

Java Code:

Pattern p = Pattern.compile("<img[^>]*src=[\\\"']([^\\\"^']*)");
Matcher m = p.matcher(content);
while (m.find()) {
    String src = m.group();
    int startIndex = src.indexOf("src=") + 5;
    content = src.substring(startIndex, src.length());
    break; // break after first image is found
}

How do I do it?


Solution

  • Are almost there. If just need the first image, as indicated by your code you could use preg_match() like that:

    <?php
    $re = '/<img[^>]*src=[\\\"\']([^\\\"^\']*)/i';
    $str = 'abc def <img src="https://www.example.com/images/abc.png" /> end';
    preg_match($re, $str, $matches);
    echo $matches[1];
    

    Demo