phpdomdocument

Using domDocument, and parsing info, I would like to get the 'href' contents of an 'a' tag


This displays the what is between the a tag, but I would like a way to get the href contents as well.

Is there a way to do that using the domDocument?

$html = file_get_contents($uri);
$html = utf8_decode($html);

/*** a new dom object ***/
$dom = new domDocument;

/*** load the html into the object ***/
@$dom->loadHTML($html);

/*** discard white space ***/
$dom->preserveWhiteSpace = false;

/*** the table by its tag name ***/
$tables = $dom->getElementsByTagName('table');

/*** get all rows from the table ***/
$rows = $tables->item(0)->getElementsByTagName('tr');

/*** loop over the table rows ***/
foreach ($rows as $row)
{
    $a = $row->getElementsByTagName('a');
    /*** echo the values ***/
    echo $a->item(0)->nodeValue.'<br />';
    echo '<hr />';
}

Solution

  • You're mere inches away from the answer -- you've already extracted the <a> tags inside your foreach loop. You're grabbing all of them in a DOMNodeList, so each item in that list will be an instance of DOMElement, which has a method called getAttribute.

    $a->item(0)->getAttribute('href') will contain the string value of the href attribute. Tada!


    It's possible that you might get an empty node list. You can work around this by checking that the first item in the list is an element.

    $href = null;
    $first_anchor_tag = $a->item(0);
    if($first_anchor_tag instanceof DOMElement)
        $href = $first_anchor_tag->getAttribute('href');