phpmysqlweb-scrapingsimple-html-dom

Simple HTML DOM Parser & Web Browser returns different HTML


I am using PHP Simple HTML DOM to parse a webpage.

Problem: However, the HTML content scrapped seems to be different from the one I get if I were to use my web browser. What may have caused the difference and how can I get the same content using Simple HTML Dom as the content displayed by the web browser?

PHP

public function action_asos() {

    include_once('/home/mysite/public_html/application/libraries/simple_html_dom.php');

    $category_url = 'http://www.asos.com/Men/T-Shirts-Vests/Cat/pgecategory.aspx?cid=7616#parentID=-1&pge=0&pgeSize=100&sort=1';

    $html = file_get_html($category_url);

    foreach($html->find('html') as $content) {
        echo $content;
    }

}

Actual page:

http://www.asos.com/Men/T-Shirts-Vests/Cat/pgecategory.aspx?cid=7616#parentID=-1&pge=0&pgeSize=100&sort=1

Retrieved using Simple HTML DOM

enter image description here


Solution

  • You need to provide a user-agent. The lack of a user-agent is, for whatever reason, causing the server to choke.