pythonselenium-webdriverhttp-headersseleniumbase

seleniumbase (undetected Chrome driver): how to set request header?


I am using seleniumbase with Driver(uc=True), which works well for my specific scraping use case (and appears to be the only driver that consistently remains undetected for me).

It is fine for everything that doesn't need specific header settings.

For one particular type of scrape I need to set the Request Header (Accept -> application/json).

This works fine, and consistently, done manually in Chrome via the Requestly extension, but I cannot work out how to put it in place for seleniumbase undetected Chrome.

I tried using execute_cdp_cmd with Network.setExtraHTTPHeaders (with Network.enable first): this ran without error but the request appeared to ignore it. (I was, tbh, unconvinced that the uc=True support was handling this functionality properly, since it doesn't appear to have full Chromium driver capabilities.)

Requestly has a selenium Python mechanism, but this has its own driver and I cannot see how it would integrate with seleniumbase undetected Chrome.

The built-in seleniumbase wire=True support won't coexist with uc=True, as far as I can see.

selenium-requests has an option to piggyback on an existing driver, but this is (to be honest) beyond my embryonic Python skills (though it does feel like this might be the answer if I knew how to put it in place).

My scraping requires initial login, so I can't really swap from one driver to another in the course of the scraping session.


Solution

  • My code fragments from second effective solution derived from now deleted bountied answer (the .v2 was the piece I had not seen previously and which I think is what made it work):

    ...
    from seleniumwire import webdriver
    from selenium.webdriver.chrome.options import Options
    from seleniumwire.undetected_chromedriver.v2 import Chrome, ChromeOptions
    ...
    chrome_options = ChromeOptions()
    driver = Chrome(seleniumwire_options={'options': chrome_options})
    driver.header_overrides = {
        'Accept': 'application/json',
    }
    ...