pythonasynchronousweb-scrapingwebautomationundetected-chromedriver

Nodriver - Detecting Download events at browser level rather than in tabs using cdp events


import asyncio
import nodriver as uc
from nodriver import cdp

binded_tabs = []
async def bind_handlers(browser):
    global binded_tabs
    while True:
        await asyncio.sleep(0.01)
        for tab in browser.tabs:
            if tab not in binded_tabs:
                tab.add_handler(cdp.page.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid))        
                binded_tabs.append(tab)

async def crawl():    

    browser = await uc.start(headless=False)

    asyncio.create_task(bind_handlers(browser))

    await browser.get("https://www.python.org/ftp/python/3.13.0/python-3.13.0-amd64.exe")
    await browser.get("https://code.visualstudio.com/sha/download?build=stable&os=win32-x64-user", new_tab=True)

    while True:
        await asyncio.sleep(0.2)  # Keep the event loop alive

if __name__ == '__main__':
    uc.loop().run_until_complete(crawl())

Above script works to detect download start on a tabs. However sometimes when clicking a download button it redirects to a new tab entirely from where download will begin. In this case it doesn't detect download.

I want to be able to add handlers to every opened tab current or future. How can I do this? Is there a cdp event for this as well? I checked out cdp.browser and it has DownloadWillBegin event class but when I use cdp.browser.DownloadWillBegin to above code the function to be called on download start event which in this case is a basic lambda logging function is not called.

My aim is to detect download at browser level across every tab currently opened or future tabs.

Any help will be greatly appreciated. I have opened a issue in github if you would like to continue there: https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/2060

I tried using nodriver events of cdp.page and cdp.browser with Tab.add_handler method.


Solution

  • The script in the question works for detecting download start on all current and future tabs as confirmed by the test below:

    import time
    import asyncio
    import nodriver as uc
    from nodriver import cdp
    
    binded_tabs = []
    async def bind_handlers(browser):
        global binded_tabs
        while True:
            await asyncio.sleep(0.01)
            for tab in browser.tabs:
                if tab not in binded_tabs:
                    tab.add_handler(cdp.page.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid))        
                    binded_tabs.append(tab)
    
    async def crawl():    
    
        browser = await uc.start(headless=False)
    
        asyncio.create_task(bind_handlers(browser))
    
        tab1 = await browser.get("https://code.visualstudio.com/sha/download?build=stable&os=win32-x64-user")
    
        tab2 = await browser.get("https://journals.lww.com/anesthesia-analgesia/fulltext/2024/05000/special_communication__response_to__ensuring_a.2.aspx", new_tab=True)
        
        time.sleep(5)
        
        pdf_button = await tab2.find("//button[contains(., 'PDF')]")
        await pdf_button.click()
    
        while True:
            await asyncio.sleep(0.2)  # Keep the event loop alive
    
    if __name__ == '__main__':
        uc.loop().run_until_complete(crawl())