I'm trying to parse a document and get all the image tags and change the source for something different.
$domDocument = new DOMDocument();
$domDocument->loadHTML($text);
$imageNodeList = $domDocument->getElementsByTagName('img');
foreach ($imageNodeList as $Image) {
$Image->setAttribute('src', 'lalala');
$domDocument->saveHTML($Image);
}
$text = $domDocument->saveHTML();
The $text
initially looks like this:
<p>Hi, this is a test, here is an image<img src="http://example.com/beer.jpg" width="60" height="95" /> Because I like Beer!</p>
and this is the output $text
:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Hi, this is a test, here is an image<img src="lalala" width="68" height="95"> Because I like Beer!</p></body></html>
I'm getting a bunch of extra tags (HTML, body, and the comment at the top) that I don't really need. Any way to set up the DOMDocument
to avoid adding these extra tags?
DomDocument is unfortunately retarded and won't let you do this. Try this:
$text = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $domDocument->saveHTML()));