phpweb-scrapingcurlxpathyoutube

Scrape a statistic from YouTube using PHP


After struggling for 3 hours at trying to do this on my own, I have decided that it is either not possible or not possible for me to do on my own. My question is as follows:

How can I scrape the numbers in the attached image using PHP to echo them in a webpage?

Image URL: http://gyazo.com/6ee1784a87dcdfb8cdf37e753d82411c

Please help. I have tried almost everything, from using cURL, to using a regex, to trying an xPath. Nothing has worked the right way.

I only want the numbers by themselves in order for them to be isolated, assigned to a variable, and then echoed elsewhere on the page.

Update:

http://youtube.com/exonianetwork - The URL I am trying to scrape.

/html/body[@class='date-20121213 en_US ltr   ytg-old-clearfix guide-feed-v2 site-left-aligned exp-new-site-width exp-watch7-comment-ui webkit webkit-537']/div[@id='body-container']/div[@id='page-container']/div[@id='page']/div[@id='content']/div[@id='branded-page-default-bg']/div[@id='branded-page-body-container']/div[@id='branded-page-body']/div[@class='channel-tab-content channel-layout-two-column selected   blogger-template ']/div[@class='tab-content-body']/div[@class='secondary-pane']/div[@class='user-profile channel-module yt-uix-c3-module-container ']/div[@class='module-view profile-view-module']/ul[@class='section'][1]/li[@class='user-profile-item '][1]/span[@class='value']

The xPath I tried, which didn't work for some unknown reason. No exceptions or errors were thrown, and nothing was displayed.


Solution

  • Perhaps a simple XPath would be easier to manipulate and debug.

    Here's a Short Self-Contained Correct Example (watch for the space at the end of the class name):

    #!/usr/bin/env php
    
    <?
    $url = "http://youtube.com/exonianetwork";
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    $html = curl_exec($ch);
    if (!$html)
    {
        print "Failed to fetch page. Error handling goes here";
    }
    curl_close($ch);
    
    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    
    $profile_items = $xpath->query("//li[@class='user-profile-item ']/span[@class='value']");
    
    if ($profile_items->length === 0) {
        print "No values found\n";
    } else {
        foreach ($profile_items as $profile_item) {
            printf("%s\n", $profile_item->textContent);
        }
    }
    
    ?>
    

    Execute:

    % ./scrape.php
    
    57
    3,593
    10,659,716
    113,900
    United Kingdom