phpweb-scrapingcurlxpathdomdocument

PHP CURL / XPATH - Links not working


i'm using the following code to scrape some external divs for http://psnc.org.uk/our-latest-news-category/psnc-news/

I wanting to scrape the PSNC News Latest News section

$ch = curl_init("http://psnc.org.uk/our-latest-news-category/psnc-news/");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);

$document = new DOMDocument;
libxml_use_internal_errors(true);
$document->loadHTML($output);
$xpath = new DOMXPath($document);

$tweets = $xpath->query("//article[@class='news-template-box']");

echo "<html><body>";
foreach ($tweets as $tweet) {
echo "\n<p>".$tweet->nodeValue."</article>\n";
}
echo "</html></body>";

It successfully scrapes the text but the links / href's / images infact all elements do not appear.

Am I missing something?


Solution

  • DOMNode::nodeValue == DOMNode::textContent, only print text content.

    http://php.net/manual/en/class.domnode.php#domnode.props.nodevalue

    $tweets = $xpath->query("//article[@class='news-template-box']");
    
    foreach ($tweets as $tweet) {
        echo $document->saveHTML($tweet);
    }