phphtmlcssdomsimpledom

Get text of the div that contains more than one characters


Imagine we want to get the div,span,p,etc of HTML page that the class OR ID name of that , contains some keywords like:

one & two & three

For example i have this HTML :

<div class="this_one">
  Need this !
</div>
<div id="some_three">
  Need this again !
</div>
<span id="two_this">
  Need again
</span>
<p class="NOT">
Not want this
</p>

I mean i want to get text inside of the special tags like (div,p,span) that the ID or CLASS of them , contains my words like (one,two,...)

How to detect them?

For example, with simpledom , or PHPDOM, or any way you want.


Solution

  • If you want to use PHP, then you can use DOMDocument + DOMXpath. I'm not a guru ath xpath but you could do something like:

    $sample_markup = '<div class="this_one">
      Need this !
    </div>
    <div id="some_three">
      Need this again !
    </div>
    <span id="two_this">
      Need again
    </span>
    <p class="NOT">
    Not want this
    </p>';
    
    $dom = new DOMDocument();
    $dom->loadHTML($sample_markup);
    $xpath = new DOMXpath($dom);
    
    $search = $xpath->query('
        /html/body//*[
            contains(@class|@id, "one") or
            contains(@class|@id, "two") or
            contains(@class|@id, "three")
        ]
    ');
    
    foreach($search as $node) {
        $value_inside_that_node = trim((string) $node->nodeValue);
        echo $value_inside_that_node . '<br/>';
    }
    

    Should output:

    Need this !
    Need this again !
    Need again