python-3.xselenium-webdriverdecoratorcontextmanagerpyvirtualdisplay

Proper way to create a wrap context manager into a decorator in python?


I have several webpages that i would like to scrape using selenium. I want to automate this and run it on a remote machine. Since each website is different, the script would require different functionalities to complete the job. Instead of having each script having the same code to start a virutal display and a webdriver, i have a rough idea of using a decorator that can start up a virtual display and webdriver like so:

    def open_headless_browser(func: Callable) -> Callable:
        disp = Display(visible=False, size=(100, 100))
        options = webdriver.ChromeOptions()
        options.add_argument("--headless=new")
        options.add_argument("--dns-prefetch-disable")
        def start(): -> None
            with disp as display:
                with webdriver.Chrome(options=self.options) as wd:
                    func()
        return start

And then i can potentially have my scripts (the one that will actually perform the scraping) like so:

@open_headless_browser
def scrape_abc(url_abc: str) -> None:
    driver.get(url_abc)
    driver.find_elements_by_xpath('abc')

@open_headless_browser
def scrape_xyz(url_xyz: str) -> None:
    driver.get(url_xyz)
    driver.find_elements_by_css('xyz')

However, several things concerning me:

i am on python3.10 selenium4.15 pyvirtualdisplay3.0

EDIT: after some thinking, this approach will not work after all. The decorated functions will not have access to the webdriver object defined in the decorator


Solution

  • EDIT: after some thinking, this approach will not work after all. The decorated functions will not have access to the webdriver object defined in the decorator

    Sure it will, you just need to pass wd as an argument to the function, something like this:

    def open_headless_browser(func: Callable) -> Callable:
        disp = Display(visible=False, size=(100, 100))
        options = webdriver.ChromeOptions()
        options.add_argument("--headless=new")
        options.add_argument("--dns-prefetch-disable")
        def start(): -> None
            with disp as display:
                with webdriver.Chrome(options=options) as wd:
                    func(wd)
        return start
    

    Then your functions will look like:

    @open_headless_browser
    def scrape_abc(driver: webdriver.Chrome) -> None:
        driver.get(url_abc)
        driver.find_elements_by_xpath('abc')
    
    @open_headless_browser
    def scrape_abc(driver: webdriver.Chrome) -> None:
        driver.get(url_xyz)
        driver.find_elements_by_xpath('xyz')
    

    If you want to be able to pass in a URL, you need to define arguments in the wrapper function, too:

    def open_headless_browser(func: Callable) -> Callable:
        disp = Display(visible=False, size=(100, 100))
        options = webdriver.ChromeOptions()
        options.add_argument("--headless=new")
        options.add_argument("--dns-prefetch-disable")
        def start(url: str): -> None
            with disp as display:
                with webdriver.Chrome(options=options) as wd:
                    func(wd, url)
        return start
    
    @open_headless_browser
    def scrape_abc(driver: webdriver.Chrome, url: str) -> None:
        driver.get(url)
        driver.find_elements_by_xpath('abc')
    

    Then it's just a case of remembering that although you define the function as having two arguments, you only call it with one.