facebookweb-scrapingcurl

Scrape new page using opengraph and curl


I am developing a website in which i have created a blog, on that blog people can comment via their facebook. Now i noticed that whenever i create a new blog, the comments plugin shows a warning "url is unreachable".

I already figured out that the way to get rid of this warning is to scrape the new blog.

if i use following query on the commandline:

curl -F "id=http://www.maartenvangenechten.be/blog/post/13/" -F "scrape=true" -k https://graph.facebook.com

the warning disappears, but on the long run this isn't the best way. Also all the data i putted in metatags are outputted, telling me that the page is succesfully scraped

so i tried using php/libcurl for this:

$params = array(
"id"=>$url,
"scrape"=>"true");

$ch = curl_init("https://graph.facebook.com");
curl_setopt_array($ch, array(
  CURLOPT_RETURNTRANSFER=>true,
  CURLOPT_SSL_VERIFYHOST=>false,
  CURLOPT_SSL_VERIFYPEER=>false,
  CURLOPT_POST=>true,
  CURLOPT_POSTFIELDS=>$params
));
$result = curl_exec($ch);
curl_close($ch);
echo $result;

now this only outputs:

{"id":"214022612077699","url":"http:\/\/www.maartenvangenechten.be\/blog\/post\/13\/"}

and not

{"url":"http:\/\/www.maartenvangenechten.be\/","type":"website","title":"Maartens Homepage","image":[{"url":"http:\/\/www.maartenvangenechten.be\/images\/general\/logo_enlighten.gif"}],"description":"Hier kan je alles vinden over mijn huidige projecten. Bekijk ook zeker de blog, waar ik de verschillende uitdagingen die ik tegenkom zal toelichten","site_name":"VangenechtenDESIGNs","admins":[{"id":"591822147","name":"Maarten Van Genechten","url":"http:\/\/www.facebook.com\/exquisitje"}],"updated_time":"2013-02-22T02:27:18+0000","id":"492686967461912","application":{"id":"482576148470885","name":"MVGPortfolio","url":"http:\/\/www.facebook.com\/apps\/application.php?id=482576148470885"}}

as i would expect

Can't seem to find why


Solution

  • Ok, solved it, after searching the web for a couple of hours...

    I tried the function on different browsers, and Opera, Firefox, and even IE returned the expected result, only Chrome gave the problem...

    Cleared the cache, history, and about everything else stored via Chrome, and the problem was gone.