javascriptnode.jspuppeteerwebautomation

puppeteer wait for page/DOM updates - respond to new items that are added after initial loading


I want to use Puppeteer to respond to page updates. The page shows items and when I leave the page open new items can appear over time. E.g. every 10 seconds a new item is added.

I can use the following to wait for an item on the initial load of the page:

await page.waitFor(".item");
console.log("the initial items have been loaded")

How can I wait for / catch future items? I would like to achieve something like this (pseudo code):

await page.goto('http://mysite');
await page.waitFor(".item");
// check items (=these initial items)

// event when receiving new items:
// check item(s) (= the additional [or all] items)

Solution

  • You can use exposeFunction to expose a local function:

    await page.exposeFunction('getItem', function(a) {
        console.log(a);
    });
    

    Then you can use page.evaluate to create an observer and listen to new nodes created inside a parent node.

    This example scrapes (it's just an idea, not a final work) the python chat in Stack Overflow, and prints new items being created in that chat.

    var baseurl =  'https://chat.stackoverflow.com/rooms/6/python';
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();
    await page.goto(baseurl);
    
    await page.exposeFunction('getItem', function(a) {
        console.log(a);
    });
    
    await page.evaluate(() => {
        var observer = new MutationObserver((mutations) => { 
            for(var mutation of mutations) {
                if(mutation.addedNodes.length) {
                    getItem(mutation.addedNodes[0].innerText);
                }
            }
        });
        observer.observe(document.getElementById("chat"), { attributes: false, childList: true, subtree: true });
    });