I have a PlaywrightCrawler to scrape Alibaba. But when I add a request to one page like:
This page lasted to loading until timeout and handlePageFunction didn't be called.
Actually, all the content has been loaded completed. I notice that some AJAX runs in the background.
How do I force PlaywrightCrawler to call handlePageFunction even though AJAX didn't complete it?
const crawler = new Apify.PlaywrightCrawler({
requestQueue,
launchContext: {
launchOptions: {
headless: false,
},
},
handlePageFunction,
});
you can change your waitUntil parameter to go to the page as soon as the DOM loads using this:
const crawler = new Apify.PlaywrightCrawler({
requestQueue,
// ...
preNavigationHooks: [async (context, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
}],
});
this will fire as soon as the page is ready to be queried by document.querySelectorAll
, you may have to wait for certain conditions inside the handlePageFunction
before starting to call page
methods