phpweb-scrapingphantomjs

PHP : Scrape data generated with javascript ( ES6 )


I try to scrape data of some URL with phantomjs and php phantomjs , but my target page generated some of the data with ES6 and phantomjs doesn't support it yet , and I got some errors like this ( in Console log ) :

ReferenceError: Can't find variable: Set

and my code is :

use JonnyW\PhantomJs\Client;

$client = Client::getInstance();

$client->getEngine()->setPath('C:\\Users\\XXX\\Desktop\\bin\\phantomjs.exe');

$request = $client->getMessageFactory()->createRequest('example.com', 'GET');

$response = $client->getMessageFactory()->createResponse();

$client->send($request, $response);
var_dump($response->getConsole());

I search a lot! and I found the phantomjs will support ES6 in new version ( v2.5 ) and release a beta version but it's doesn't work for me!

now, what I do? is there any way to scrape this page?


Solution

  • While the future of PhantomJS is not yet certain, may I suggest another headless browser to use: puppeteer. It is based on Google Chrome headless and behind it is a separate team of Google engineers.

    There are already projects to control it from PHP, most notable at the moment is puphpeteer*

    __
    * (notable in the way that not only can it make screenshots/PDF, but it also offers javascript evaluation)