phptexthtml-parsinghrefsimple-html-dom

Use Simple HTML Dom to parse HTML and generate an array of href values and plain text


The snippet below loops through some web pages, grabs the html and then looks for table.results and gets the plaintext out of the tags contained in each . $result is ok.

Now I'm trying to get the href value of an tag that is found in the second of each . I'd like to include this in the $results array, but I'm not sure how to do this. The third foreach statement gets them but then I need to merge $links with $results. Ideally I'd also get the links in the second foreach statement.

Does anyone know how?

$i = 0;
foreach( $urls as $u )
{           
    $html = file_get_html($u);
    
    foreach($html->find('.results tbody tr') as $element)
    {
        $result[$i] = $this->extract($element->plaintext);
        $i++;                   
    }
    
    foreach($html->find('.results tbody tr a') as $element)
    {
        $links[$i] = $element->href;
        $i++;           
    }                            
}

print_r($result); 
print_r($links); 

die;

Solution

  • $html = file_get_html($u);
    foreach($html->find('.results tbody tr') as $element)
    {
      $links = $element->find('a');
      foreach($links as $l) {
        $result[] = $l->href;
      }
      $result[] = $this->extract($element->plaintext);
    }