I am parsing html in php and as I have no control over the original content I want to strip it of styling and unnecessary tags while still keep the content and a short list of tags, namely:
p, img, iframe (and maybe a couple of others)
I know I can remove a given tag (see code I am using for this below), but as I don't necessarily know what tags their could possibly be, and I don't want to create a huge list of possibles, I would like to be able to strip everything except my allowed list.
function DOMRemove(DOMNode $from) {
$sibling = $from->firstChild;
do {
$next = $sibling->nextSibling;
$from->parentNode->insertBefore($sibling, $from);
} while ($sibling = $next);
$from->parentNode->removeChild($from);
}
$dom = new DOMDocument;
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('span');
As spoken by cpattersonv1 above, you can simply use strip_tags() for the job.
<?php
// strip all other tags except mentioned (p, img, iframe)
$html_result = strip_tags($html, '<p><img><iframe>');
?>