phpsimple-html-dom

Find two tags sequentially


<h2>t1</h2>
<strong>s1</strong>
</hr>
<h2>t2</h2>
<p><strong>s2</strong></p>
<strong>s3</strong>
<strong>s4</strong>
<h2>t3</h2>
<strong>s5</strong>

'h2' tags are followed by unknown number of 'strong' tags and 'p' tags, some times a 'strong' is embeded in 'p'. Sometimes, other tags, such as 'hr', exist. Is there a way to retrieve all 'h2', and each 'h2' is followed by the first 'strong' after the 'h2'. For example, for the above code, I would like to get:

t1
s1
t2
s2
t3
s5

I tried to get all 'h2' in one array, and all 'strong' in another, but I could not find which 'strong' is the first one that is following an 'h2'.


Solution

  • You can use the following ways:

    include_once 'simple_html_dom.php';
    
    
    $nodes = str_get_html('<h2>t1</h2>
    <strong>s1</strong>
    </hr>
    <h2>t2</h2>
    <p><strong>s2</strong></p>
    <strong>s3</strong>
    <strong>s4</strong>
    <h2>t3</h2>
    <strong>s5</strong>')->nodes;
    

    Get the sequential list

    $list = [];
    foreach($nodes[0]->children as $child) {
      if($child->tag == 'h2' || $child->tag == 'strong') {
        $list[] = $child->innertext;
      }
    }
    

    result

    array (size=7)
      0 => string 't1' (length=2)
      1 => string 's1' (length=2)
      2 => string 't2' (length=2)
      3 => string 's3' (length=2)
      4 => string 's4' (length=2)
      5 => string 't3' (length=2)
      6 => string 's5' (length=2)
    

    Get nested list

    $nested = [];
    $a = -1;
    foreach($nodes[0]->children as $child) {
        if($child->tag == 'h2') {
          $a++;
          $nested[$a]['value'] = $child->innertext;
        } elseif($child->tag == 'strong') {
          $nested[$a]['children'][] = $child->innertext;
        }
    }
    

    Result

    array (size=3)
      0 => 
        array (size=2)
          'value' => string 't1' (length=2)
          'children' => 
            array (size=1)
              0 => string 's1' (length=2)
      1 => 
        array (size=2)
          'value' => string 't2' (length=2)
          'children' => 
            array (size=2)
              0 => string 's3' (length=2)
              1 => string 's4' (length=2)
      2 => 
        array (size=2)
          'value' => string 't3' (length=2)
          'children' => 
            array (size=1)
              0 => string 's5' (length=2)