playwrightapify

PlaywrightCrawler continuous loading does not trigger handlePageFunction


I have a PlaywrightCrawler to scrape Alibaba. But when I add a request to one page like:

https://www.alibaba.com/product-detail/Mono-filament-12-mm-PP-fiber_1600139352513.html?spm=a27aq.industry_category_productlist.dt_3.1.3d733642TkHgZc

This page lasted to loading until timeout and handlePageFunction didn't be called.

Actually, all the content has been loaded completed. I notice that some AJAX runs in the background.

How do I force PlaywrightCrawler to call handlePageFunction even though AJAX didn't complete it?

const crawler = new Apify.PlaywrightCrawler({
    requestQueue,


    launchContext: {
        launchOptions: {
            headless: false,
        },
    },
    handlePageFunction,
});

Solution

  • you can change your waitUntil parameter to go to the page as soon as the DOM loads using this:

    const crawler = new Apify.PlaywrightCrawler({
        requestQueue,
        // ...
        preNavigationHooks: [async (context, gotoOptions) => {
           gotoOptions.waitUntil = 'domcontentloaded';
        }],
    });
    

    this will fire as soon as the page is ready to be queried by document.querySelectorAll, you may have to wait for certain conditions inside the handlePageFunction before starting to call page methods