pythonhtml-parsingundetected-chromedriver

Getting a HTTP Response with nodriver


I'm using nodriver and it's not directly supporting network methods. But it does support for several CDP objects (network: https://ultrafunkamsterdam.github.io/nodriver/nodriver/cdp/network.html) and using it via tab.send().

So I did hope that someone already used it to get a Response or sending a Request with generating a Header from the actual browser session or something like that...

I'm parsing a website where I don't get all information in the HTML code, I need to get the Response that gets triggered by an action on the website.

I'm too new to Python (and I'm not very experienced with programming at all ^^'), so I have no clue where to start... I even tried to feed the documentation to Gemini (Advanced), but Gemini said, that I can't read responses with nodriver, even if I use CDP to create a Request and send it.

But in the CDP COMMANDS I have "enable" (Enables network tracking, network events will now be delivered to the client.) and "get_response_body" (Returns content served for the given request.) and I can't believe that it's not possible.


Solution

  • It is possible, but first, to get the response body you need to obtain the request id. For this, you need to add a handler:

    tab = await browser.get('about:blank')
    tab.add_handler(cdp.network.ResponseReceived, handler)
    

    Notice, that you have to have opened a tab first. If you know the URL (or at least part of it) of the request which response you're looking for, your handler could look like this:

    async def handler(evt: cdp.network.ResponseReceived):
        if evt.response.encoded_data_length > 0:
            if 'something' in evt.response.url:
                global temp
                temp = evt.request_id
    

    This will save your request id into the global temp variable which you can use to obtain the response body like so:

    # somewhere after tab = await browser.get(URL) so that the temp variable is set
    body, is_base64 = await tab.send(cdp.network.get_response_body(temp))
    

    Your solution will probably need to look different, but this is how it works conceptually.