NodeJS, PhantomJS, content parsing with Cheerio
Need to parse webpage, that contains dynamically loaded div(hint). The event can be on many table td's, here is an example
When I 'mouseover' on specific td I see this orange block with data, it's dynamically loaded with function, like this
onmouseover="page.hist(this,'P-0.00-0-0','355svxv498x0x0',417,event,0,1)"
I can view this info only after the page is loaded. Need to a specific row, only Marathonbet.
When the function runs, the text is loaded into another div (id='tooltip') and shown to the user.
I use phantom to parse the content of this page, everything OK with static values, but how I can receive this dynamically generated block to my rendered web page inside node router? I see 2 ways:
Emulate function start after page is loaded and i known they codes ('355svxv498x0x0',417), but how I can run this function from node, from phantom?
Here is some code, that recieve static page content in my router
```
phantom.create(config.phantomParams).then(ph => {
_ph = ph;
return _ph.createPage();
}).then(page => {
_page = page;
return _page.on('onConsoleMessage', function (msg) {
console.log(msg);
});
}).then(() => {
return _page.on('viewportSize', {width: 1920, height: 1080});
}).then(() => {
return _page.on('dpi', 130)
}).then(() => {
_page.setting('userAgent', config.userAgent);
return _page.open(matchLink);
}).then(() => {
return _page.property('content');
}).then(content => {
let $ = cheerio.load(content);
// working with content and get needed elements
console.log($.html());
}).then(() => {
_page.close();
_ph.exit();
});
``` Should I use Casper/Spooky, or anyone can explain how to use it in this case?
UPD. Trying with puppeteer, the code
```
let matchLink = 'http://www.oddsportal.com/soccer/world/club-friendly/san-carlos-guadalupe-xnsUg7zB/';
(async () => {
const browser = await puppeteer.launch({
args: [
'--proxy-server=46.101.167.43:80',
]});
const page = await browser.newPage();
await browser.userAgent(config.userAgent);
await page.setViewport({width: 1440, height: 960});
await page.goto(matchLink);
await page.evaluate(() => page.hist(this,'P-0.00-0-0','355svxv464x0x7omg7',381,event,0,1));
let bodyHTML = await page.evaluate(() => document.body.innerHTML);
console.log(bodyHTML);
await page.screenshot({path: 'example.png'});
await browser.close();
})();
```
Get ```
(node:8591) UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'stopPropagation' of undefined
at toolTip (http://www.oddsportal.com/res/x/global-180713073352.js:1:145511)
at TableSet.historyTooltip (http://www.oddsportal.com/res/x/global-180713073352.js:1:631115)
at PageEvent.PagePrototype.hist (http://www.oddsportal.com/res/x/global-180713073352.js:1:487314)
at __puppeteer_evaluation_script__:1:13
at ExecutionContext.evaluateHandle (/home/gil/Projects/oddsbot/node_modules/puppeteer/lib/ExecutionContext.js:97:13)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
```
Error in target JS file, may be something with request..
Since you're open to suggestions I propose puppeteer It's a native node.js module that opens pages in the newest Chromium (especially useful since PhantomJS is very outdated) and is close to PhantomJS in terms of doing thinkgs.
If you also use node.js 8.x, async/await syntax is available for working with promises and it makes scraping with puppeteer a breeze.
So to run that function in puppeteer you would run
await page.evaluate(() => page.hist(this,'P-0.00-0-0','355svxv498x0x0',417,event,0,1) );
Update
Puppeteer has lots of convenience helpers, one of them is page.hover that literally will hover a pointer over an element:
await page.hover('td.some_selector');
But should you want to continue using Phantomjs and the excellent phantom
module, you can:
_page.evaluate(function() {
page.hist(this,'P-0.00-0-0','355svxv498x0x0',417,event,0,1)
})
Documents on page.evaluate
: http://phantomjs.org/api/webpage/method/evaluate.html