accessibilitynvda

Automate existing web browser session


How can I programmatically interact with an existing web page in a web browser launched in a standard way? For example I navigate to a specific page and want to be able to run a Python script that fills some edits or clicks some elements.

This should be possible at least through IAccessible2 for main browsers, but I did not find any pointers. To put it in another way, how do screen readers do it? And bonus question, is there Python library for it?

EDIT: I am looking for something more than user input simulation. I would like to programmatically read the DOM at least, write if possible. So far I have looked at code in NVDA which is very low-level and complex. Is there anything easier?


Solution

  • How can I programmatically interact with an existing web page in a web browser launched in a standard way? For example I navigate to a specific page and want to be able to run a Python script that fills some edits or clicks some elements.

    The answer is keyboard/mouse macros if you have to visually see the browser as it happens. You can google macro programs for your OS.

    But you most likely are looking for a headless browser such as PhantomJS, HtmlUnit, TrifleJS, Splash, and SimpleBrowser

    Check out - https://saucelabs.com/blog/headless-browser-testing-101

    When you mention 'interact with an existing webpage in a web browser launched in the standard way' you are talking about the DOM (Document Object Model).

    Many QA environments are running testing scripts on code that has not been rendered by the browser into a DOM (you see the DOM when you inspect a page using your browser tools). When you use a headless browser it creates the DOM and then runs all the tests as if a human were clicking without having to visually look at it happen.

    see - https://css-tricks.com/dom/

    To put it in another way, how do screen readers do it? And bonus question, is there Python library for it?

    Screen readers are interacting with the DOM at a low level. I do not know if there is a Python library. Most likely this would be overkill though unless you are building a desktop app that interacts with browsers like a screen reader does.

    edit...

    I did some more digging and found this article that is a much more verbose explanation of how screen readers interact with the browser/dom.

    Also, there is a python API for manipulating the DOM and this library seemed popular.