I would like to remove unsupported tags of html inserted by users (system define which tag is supported), example system is only supported "div" tag:
<div><span>Hello</span> <span>World</span></div>
will convert to:
<div>Hello World</div>
This is my code with Simple HTML DOM:
function main()
{
$content = '<div><span>Hello</span> <span>World</span></div>';
$html = str_get_html($content);
$html = htmlParser($html);
}
function htmlParser($html)
{
$supportedTags = ['div'];
foreach ($html->childNodes() as $node) {
// Remove unsupported tags
if (!in_array($node->tag, $supportedTags)) {
$node->parent()->innertext = str_replace($node->outertext, $node->innertext, $node->parent()->innertext);
$node->outertext = '';
}
if ($node->childNodes()) {
htmlParser($node);
}
}
return $html;
}
But thing get wrong if contain multiple nested unsupported tags, eg:
<div><span>Hello</span> <span>World</span> <span><b>!!</b></span></div>
it will be converted to
<div>Hello World <b>!!</b></div>
but expected result is
<div>Hello World !!</div>
What is the solution? Should I continue to use Simple HTML DOM or find another way to solve this issue?
Thanks for solving my problem in advanced.
After some struggles, I found out I should not edit $node->parent() as it's in a loop and should load the childNodes first. The code should be like this:
function htmlParser($html)
{
$supportedTags = ['div'];
foreach ($html->childNodes() as $node) {
if ($node->childNodes()) {
htmlParser($node);
}
// Remove unsupported tags
if (!in_array($node->tag, $supportedTags)) {
$node->outertext = $node->innertext;
}
}
return $html;
}