phpweb-scrapingcurlforeach

How can I echo a scraped div in PHP?


How do I echo and scrape a div class? I tried this but it doesn't work. I am using cURL to establish the connection. How do I echo it? I want it just how it is on the actual page.

    $document = new DOMDocument();
    $document->loadHTML($html);
    $selector = new DOMXPath($document);
    $anchors = $selector->query("/html/body//div[@class='resultitem']");
    //a URL you want to retrieve
    
    foreach($anchors as $a) { 
        echo $a;
    }


Solution

  • I just made this snippet below, that uses your logic, and some tweaks to display the specified class from the webpage in the get_contents function. Maybe you can plug in your values and try it?

    (Note: I put the error checking in there to see a few bugs. It can be helpful to use that as you tweak. )

    <?php
    error_reporting(E_ALL);
    ini_set('display_errors', '1');
    
    $url = "http://www.tizag.com/cssT/cssid.php";
    $class_to_scrape="display";
    
    $html = file_get_contents($url);
    $document = new DOMDocument(); 
    $document->loadHTML($html); 
    $selector = new DOMXPath($document); 
    
    $anchors = $selector->query("/html/body//div[@class='". $class_to_scrape ."']");
     
    echo "ok, no php syntax errors. <br>Lets see what we scraped.<br>";
    
    foreach ($anchors as $node) {
        $full_content = innerHTML($node);
       echo "<br>".$full_content."<br>" ;
    }
    
    /* this function preserves the inner content of the scraped element. 
    ** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags
    ** So be sure to go and give that post an uptick too:)
    **/
    function innerHTML(DOMNode $node)
    {
      $doc = new DOMDocument();
      foreach ($node->childNodes as $child) {
        $doc->appendChild($doc->importNode($child, true));
      }
      return $doc->saveHTML();
    }
    
    
    ?>