The nightmare is working, of course I'm testing this tool but nway, the mainly problem is why my function isnt on a infinite loop? Since i didnt make a condition to page. May I'm doing this wrong?
The case that I wanted was: whenever page loaded, I get the tittle with page then call the function again to next page till the last page.
I tried without success the setTimeout too.
My console log just print 1 then finish.
The code snippet is here:-
var pagn = 1;
function ab(page){
nightmare.goto(url_base+"&page="+page)
.evaluate(() => {
return document.title;
})
.end()
.then((title) => {
console.log(title + ":" + page);
ab(++pagn);
//setTimeout("page(" + page + ")", 5000);
}).catch(()=>{console.log("Error");});
}
ab(pagn);
The problem is that you are ending your nightmare session with the .end()
statement, which stops the nightmare engine, and so node exits after running through the remaining .then
statements.
To test your code, I rewrote your function a bit, so that it scrapes a particular website, and exits when it finds the same page multiple times (which is kinda my test scenario, so you might have to adapt it for your code)
const Nightmare = require('nightmare')
const nightmare = Nightmare({ show: true })
function scrapePages( targetUrl, curPage = 0, transform = (url, page) => `${url}?page=${page}`, pageSet = new Set() ) {
console.info('Trying to scrape page ' + transform( targetUrl, curPage ) );
return nightmare
.goto( transform( targetUrl, curPage ) )
.evaluate( () => document.title )
.then( (title) => {
if (pageSet.has( title )) {
throw 'page already exists';
}
pageSet.add( title );
console.info( title + ':' + curPage );
return scrapePages( targetUrl, curPage + 1, transform, pageSet );
})
.catch( ( err ) => {
console.error( err );
return { maxPages: curPage, pages: pageSet };
} );
}
scrapePages( 'some-paged-url', 0, (url, page) => url + '/' + (page + 1) )
.then( ({ maxPages, pages }) => {
// end nightmare process
nightmare.end().then( () => {
console.info(`Found ${maxPages} pages`);
});
} )
.catch( err => console.error('Error occured', err ) );
The biggest difference, as you can see, is that the ending of the nightmare process only occurs once the scraping ran through. At that time, you would have the total pages available and all pages that were called successfully