I am developing a plugin for my WordPress site. I want to select all non-empty paragraph elements.
Here is my code :
function my_php_custom_function($content){
// Create a new DOMDocument instance
$dom = new DOMDocument();
// Load the HTML content into the DOMDocument
$dom->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
// Create a DOMXPath object to query the DOM
$xpath = new DOMXPath($dom);
// Find all non-empty p elements in the content
$p_elements = $xpath->query('//p[string-length(normalize-space()) > 0]');
}
add_filter('the_content','my_php_custom_function')
$p_elements
in this variable I am getting those paragraphs also which I have just created by pressing enter. When I check on DOM, it is showing as <p> </p>
You're likely using some sort of WYSIWYG editor for your content, which in some cases produce elements only containing
To get non-empty P elements and also ignoring P elements containing only
your XPath could look like the following:
//p[normalize-space() and not(normalize-space(.) = ' ')]
Updated answer:
Apparently, the representation in the DOMDocument of the
converts fully (via bin2hex()
to c2a0
. Using this knowledge, we can input it as the hexidecimal conversion instead (\xC2\xA0
).
This would render your query to look somewhat like the following:
$p_elements = $xpath->query('//p[normalize-space() and not(normalize-space(.) = "'."\xC2\xA0".'")]');
While not pretty (due to all the escaping), it works in my small tests.