// Find all element has attribute id
$ret = $html->find('*[id]');
This is an example for finding all elements which have attribute id. Is there any way to find all elements. I try this way but it does not work:
// Find all element
$ret = $html->find('*');
I want to fetch through all the elements in $html, all parents and childs elements will be fetched. Example:
<div>
<span>
<div>World!</div>
<div>
<span>Hello!</span>
<span>
<div>Hello World!</div>
</span>
</div>
</span>
</div>
Now I want to escape all <span>
with their plaintext inside and keep all <div>
we have! Expected result:
<div>
<div>World!</div>
<div>
<div>Hello World!</div>
</div>
</div>
/**
* Refine the input HTML (string) and keep what was specified
*
* @param $string : Input HTML
* @param array $allowed : What will be kept?
* @return bool|simple_html_dom
*/
function crl_parse_html($string, $allowed = array())
{
// String --> DOM Elements
$string = str_get_html($string);
// Fetch child of the current element (one by one)
foreach ($string->find('*') as $child) {
if (
// Current inner-text contain one or more elements
preg_match('/<[^<]+?>/is', $child->innertext) and
// Current element tag is in maintained elements array
in_array($child->tag, $allowed)
) {
// Assign current inner-text to current filtered inner-text
$child->innertext = crl_parse_html($child->innertext, $allowed);
} else if (
// Current inner-text contain one or more elements
preg_match('/<[^<]+?>/is', $child->innertext) and
// Current element tag is NOT in maintained elements array
!in_array($child->tag, $allowed)
) {
// Assign current inner-text to the set of inner-elements (if exists)
$child->innertext = preg_replace('/(?<=^|>)[^><]+?(?=<|$)(<[^\/]+?>.+)/is', '$1', $child->innertext);
// Assign current outer-text to current filtered inner-text
$child->outertext = crl_parse_html($child->innertext, $allowed);
} else if (
(
// Current inner-text is only plaintext
preg_match('/(?<=^|>)[^><]+?(?=<|$)/is', $child->innertext) and
// Current element tag is NOT in maintained elements array
!in_array($child->tag, $allowed)
) or
// Current plain-text is empty
trim($child->plaintext) == ''
) {
// Assign current outer-text to empty string
$child->outertext = '';
}
}
return $string;
}
This is my solution, I made it, I just post here if someone need it and end this question.
Note that: this function uses recursive. So, too large data will be a big problem. Reconsider carefully when decide to use this function.