phpweb-scrapingsimple-html-dom

simple_html_dom: trying to find height in google search


Anyone can explain to me what is wrong with the code and how do i get the height value? I am trying to get the height of celebrities. Any suggestions?

Thanks.

My code (Updated with CURL user agent setting as advised):

$url='https://www.google.com/webhp?ie=UTF-8#q=ailee+height';

//Set CURL user agent
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);

$data = curl_exec($ch);
curl_close($ch);

//simple html dom
require_once('lib/simple_html_dom.php');
$html = str_get_html($data);
$height= $html->find('div[class="_eF"]',0)->innertext;
echo $height;

I get empty from the above code. In this case, I want to return:

5' 5" (1.65 m)

Solution

  • The problem is that curl doesn't process JavaScript and Google will show a different webpage when JavaScript is disabled, in this case, the div changes to a span with a different id

    <span class="_m3b">1.65 m</span>
    

    Also, the link you were using wasn't working for me.

    Try this instead:

    <?php
    header('Content-Type: text/html; charset=utf-8');
    $url='https://www.google.pt/search?q=ailee+height&num=10&gbv=1';
    
    //Set CURL user agent
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36');
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    
    $data = curl_exec($ch);
    curl_close($ch);
    
    require_once('simple_html_dom.php');
    $html = str_get_html($data);
    $height= $html->find('span[class="_m3b"]',0)->innertext;
    echo $height;
    //1.65 m