I have a little problem, how can I find the <img>
src
string which ends with dealer.jpg
and remove only this tag from my content? for example:
<?php
$content = '<b>this is a content</b><img src=http://adress.com/as5.jpg><br> this is a content <img src=http://www.another-adress.com/dealer.jpg>';
$inf = explode("/dealer.jpg", $content);
$string = str_replace("<img src=\"$inf[0]/dealer.jpg\">", "", $content);
?>
I use this because I don't know the full image link; the full link is unpredictable, but I know the unwanted img's src
value ends with dealer.jpg
.
My script is not working... Can someone help me to correct it? This will help me to remove ads from the page that I've scraped.
If i understood correctly you are trying to remove the img
tag that ends with "dealer.jpg" (no matter the domain), right? try this:
$content = '<b>this is a content</b><img src=http://adress.com/as5.jpg><br> this is a content <img src=http://www.another-adress.com/dealer.jpg>';
$content = preg_replace('/<img src=[A-z0-9-_":\.\/]+\/dealer\.jpg>/', '', $content);
var_dump($content);
Edit
This second example will match the img
tag even if it has another attributes such as alt
, width
, etc (but again, must end with "dealer.jpg")
$content = '<b>this is a content</b><img src="http://adress.com/as5.jpg"><br> this is a content <img alt="dealer-image" width="120" height="40" src="http://www.another-adress.com/dealer.jpg">';
$content = preg_replace('/<img[A-z0-9-_:="\.\/ ]+src="[A-z0-9-_:\.\/]+\/dealer\.jpg">/', '', $content);
var_dump($content);
Obs: I changed the initial $content
because i've noticed it was missing the double quotation for src
attribute. Not sure if was a typo or your string really looks like this.
Edit 2
Here is a example using DOM (a guess that is the best aproach here since the order of attributes could change):
$content = '<b>this is a content</b><img src="http://adress.com/as5.jpg"><br> this is a content <img alt="dealer-image" width="120" height="40" src="http://www.another-adress.com/dealer.jpg">';
// creates a DOMDocument based on your string, and wraps it in a div
$dom = new DOMDocument();
$dom->loadHTML("<div>{$content}</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
// get all img tags
$imgs = $dom->getElementsByTagName('img');
foreach ($imgs as $img) { // if they have that src, remove it from $dom
if (strpos($img->getAttribute('src'), 'dealer.jpg')) {
$img->parentNode->removeChild($img);
};
}
// get all the content of my first div, and print it
$newContent = $dom->getElementsByTagName('div')->item(0);
foreach ($newContent->childNodes as $childNode) {
var_dump($dom->saveHTML($childNode));
}