$html = file_get_html('http://www.livelifedrive.com/');
echo $html->plaintext;
I've no problem scraping other websites but this particular one returns gibberish.
Is it encrypted or something?
Actually, the gibberish you see is a GZIPed content.
When I fetch the content with hurl.it for instance, here are the headers returned by server:
GET http://www.livelifedrive.com/malaysia/ (the url http://www.livelifedrive.com/ resolves to http://www.livelifedrive.com/malaysia/) Connection: keep-alive Content-Encoding: gzip <--- The content is gzipped Content-Length: 18202 Content-Type: text/html; charset=UTF-8 Date: Tue, 31 Dec 2013 10:35:42 GMT P3p: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM" Server: nginx/1.4.2 Vary: Accept-Encoding,User-Agent X-Powered-By: PHP/5.2.17
So once you have scraped the content, unzip it. Here is a sample code:
if ( ! function_exists('gzdecode'))
{
/**
* Decode gz coded data
*
* http://php.net/manual/en/function.gzdecode.php
*
* Alternative: http://digitalpbk.com/php/file_get_contents-garbled-gzip-encoding-website-scraping
*
* @param string $data gzencoded data
* @return string inflated data
*/
function gzdecode($data)
{
// strip header and footer and inflate
return gzinflate(substr($data, 10, -8));
}
}
References: