pythonseleniumproxydesiredcapabilities

Apply proxy gateway in Selenium webdriver


My goal is to apply a proxy gateway (eg. geosurf.io) inside of the Selenium webdriver.

  1. I need to do it thru using DesiredCapabilities, since it seems DesiredCapabilities being the only way for plugging in proxy [gateway] (source).
  2. DesiredCapabilities functionality works at Selenium Grid (not just in a plain Selenium server). Selenium Grid docs.
  3. I've successfully run Selenium Grid at a local Windows 10 machine. enter image description here

  4. So, I've composed the following code to apply DesiredCapabilities and proxy gateway for capabilities used in Selenium webdriver:

    import requests
    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    PROXY = "gw1.geosurf.io:8080" # my account at geosurf.io, port 8080 - Germany
    
    from selenium.webdriver.common.proxy import Proxy, ProxyType
    proxy_object = Proxy()
    proxy_object.proxy_type = ProxyType.MANUAL
    proxy_object.http_proxy = PROXY
    proxy_object.socks_proxy = PROXY
    proxy_object.ssl_proxy = PROXY
    
    keep_alive = True
    browser_profile=None
    
    webdriver.DesiredCapabilities.FIREFOX = {
        "class":"org.openqa.selenium.Proxy",
        "autodetect":False,
        "platform": "WIN10"
    }
    driver = webdriver.Remote("http://192.168.43.98:5566/grid/register", webdriver.DesiredCapabilities.FIREFOX, browser_profile, proxy_object, keep_alive)
    

I've forced some output of what is inside of webdriver.py (C:\Python27\Lib\site-packages\selenium\webdriver\remote\webdriver.py) when running the above code, in the __init__:

command_executor:  http://192.168.43.98:5566/grid/register
capabilities: 
{'autodetect': False,
 'class': 'org.openqa.selenium.Proxy',
 'platform': 'WIN10',
 'proxy': {'httpProxy': 'gw1.geosurf.io:8080',
           'proxyType': 'MANUAL',
           'socksProxy': 'gw1.geosurf.io:8080',
           'sslProxy': 'gw1.geosurf.io:8080'}}

Yet the problem turned to be in webdriver.py:

Traceback (most recent call last):
  File "C:\Users\User\Documents\RnD\captcha-test\test_geosurf_proxy_gateway.py", line 21, in <module>
    driver = webdriver.Remote("http://192.168.43.98:5566/grid/register", webdriver.DesiredCapabilities.FIREFOX, browser_profile, proxy_object, keep_alive)
  File "C:\Python27\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 99, in __init__
    self.start_session(desired_capabilities, browser_profile)
  File "C:\Python27\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 191, in start_session
    self.session_id = response['sessionId']
TypeError: string indices must be integers

The error, TypeError: string indices must be integers, seems being not proxy gateway type, nor DesiredCapabilities's settings related.

When outputting at line 190, the response variable being a string, contaning html snippet:

     <!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <link rel="stylesheet" type="text/css" href="/assets/displayhelpservlet.css" media="all"/>
  <link href="/assets/favicon.ico" rel="icon" type="image/x-icon" />
  <script src="/assets/jquery-3.1.1.min.js" type="text/javascript"></script>
  <script src="/assets/displayhelpservlet.js" type="text/javascript"></script>
  <script type="text/javascript">
    var json = Object.freeze('{"version":"3.4.0","type":"Grid Node","consoleLink":"/wd/hub"}');
  </script>
</head>
<body>

<div id="content">
  <div id="help-heading">
    <h1><span id="logo"></span></h1>
    <h2>Selenium <span class="se-type"></span>&nbsp;v.<span class="se-version"></span></h2>
  </div>

  <div id="content-body">
    <p>
      Whoops! The URL specified routes to this help page.
    </p>
    <p>
      For more information about Selenium <span class="se-type"></span> please see the
      <a class="se-docs">docs</a> and/or visit the <a class="se-wiki">wiki</a>.
      <span id="console-item">
        Or perhaps you are looking for the Selenium <span class="se-type"></span> <a class="se-console">console</a>.
      </span>
    </p>
    <p>
      Happy Testing!
    </p>
  </div>

  <div>
    <footer id="help-footer">
      Selenium is made possible through the efforts of our open source community, contributions from
      these <a href="https://github.com/SeleniumHQ/selenium/blob/master/AUTHORS">people</a>, and our
      <a href="http://www.seleniumhq.org/sponsors/">sponsors</a>.
   </footer>
  </div>
 </div>

</body>
</html>

How to resolve this webdriver.py issue?

Update

When further debugging webdriver.py I output response varible right after response = self.execute(Command.NEW_SESSION, parameters):

{'status': 0,
 'value': u'<!DOCTYPE html>\n<html lang="en">\n<head>\n  <meta charset="UTF-8">\n  <link rel="stylesheet" type="text/css" href="/assets/displayhelpservlet.css" media="all"/>\n  <link href="/assets/favicon.ico" rel="icon" type="image/x-icon" />\n  <script src="/assets/jquery-3.1.1.min.js" type="text/javascript"></script>\n  <script src="/assets/displayhelpservlet.js" type="text/javascript"></script>\n  <script type="text/javascript">\n    var json = Object.freeze(\'{"version":"3.4.0","type":"Grid Node","consoleLink":"/wd/hub"}\');\n  </script>\n</head>\n<body>\n\n<div id="content">\n  <div id="help-heading">\n    <h1><span id="logo"></span></h1>\n    <h2>Selenium <span class="se-type"></span>&nbsp;v.<span class="se-version"></span></h2>\n  </div>\n\n  <div id="content-body">\n    <p>\n      Whoops! The URL specified routes to this help page.\n    </p>\n    <p>\n      For more information about Selenium <span class="se-type"></span> please see the\n      <a class="se-docs">docs</a> and/or visit the <a class="se-wiki">wiki</a>.\n      <span id="console-item">\n        Or perhaps you are looking for the Selenium <span class="se-type"></span> <a class="se-console">console</a>.\n      </span>\n    </p>\n    <p>\n      Happy Testing!\n    </p>\n  </div>\n\n  <div>\n    <footer id="help-footer">\n      Selenium is made possible through the efforts of our open source community, contributions from\n      these <a href="https://github.com/SeleniumHQ/selenium/blob/master/AUTHORS">people</a>, and our\n      <a href="http://www.seleniumhq.org/sponsors/">sponsors</a>.\n   </footer>\n  </div>\n </div>\n\n</body>\n</html>'}

Why doesn't it contain sessionId key-value ?

Update 2

My partial success was to run

`driver = webdriver.Remote("http://192.168.43.98:5566/wd/hub", webdriver.DesiredCapabilities.FIREFOX, browser_profile, proxy_object, keep_alive)` 

as the last line of the script. It has produced the following error:

Traceback (most recent call last):
  File "C:\Users\User\Documents\RnD\captcha-test\test_geosurf_proxy_gateway.py", line 21, in <module>
    driver = webdriver.Remote("http://192.168.43.98:5566/wd/hub", webdriver.DesiredCapabilities.FIREFOX, browser_profile, proxy_object, keep_alive)
  File "C:\Python27\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 101, in __init__
    self.start_session(desired_capabilities, browser_profile)
  File "C:\Python27\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 193, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "C:\Python27\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 265, in execute
    self.error_handler.check_response(response)
  File "C:\Python27\Lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
WebDriverException: Message: The best matching driver provider org.openqa.selenium.ie.InternetExplorerDriver can't create a new driver instance for Capabilities [{proxy={httpProxy=gw1.geosurf.io:8080, proxyType=MANUAL, socksProxy=gw1.geosurf.io:8080, sslProxy=gw1.geosurf.io:8080}, autodetect=false, class=org.openqa.selenium.Proxy, platform=WIN10}]
Build info: version: '3.4.0', revision: 'unknown', time: 'unknown'
System info: host: 'DESKTOP-78JS3VQ', ip: '192.168.43.98', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_131'
Driver info: driver.version: unknown
Stacktrace:
    at org.openqa.selenium.remote.server.DefaultDriverFactory.newInstance (DefaultDriverFactory.java:62)
    at org.openqa.selenium.remote.server.DefaultSession$BrowserCreator.call (DefaultSession.java:222)
    at org.openqa.selenium.remote.server.DefaultSession$BrowserCreator.call (DefaultSession.java:209)
    at java.util.concurrent.FutureTask.run (None:-1)
    at org.openqa.selenium.remote.server.DefaultSession$1.run (DefaultSession.java:176)
    at java.util.concurrent.ThreadPoolExecutor.runWorker (None:-1)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (None:-1)
    at java.lang.Thread.run (None:-1)

changing proxy address to localhost:8080 has brought the same error...

Update 3

I've succesded to launch/open the node console manually at browser http://192.168.43.98:5566/wd/hub/static/resource/hub.html enter image description here

Yet, the only session I could load was of Chrome browser enter image description here

No success to load FireFox or IE 10 browser sessions as possible for this Grid: enter image description here

I do not know if it's helpful of how to manage Grid nodes for external proxies plugging in.


Solution

  • Eventually I could have opened Chrome browser instance with the following code:

    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    PROXY = "gw1.geosurf.io:8080" # 8080 - Germany
    
    from selenium.webdriver.common.proxy import Proxy, ProxyType
    proxy_object = Proxy()
    proxy_object.proxy_type = ProxyType.MANUAL
    proxy_object.http_proxy = PROXY
    proxy_object.socks_proxy = PROXY
    proxy_object.ssl_proxy = PROXY
    
    keep_alive = True
    browser_profile=None
    
    capabilities = webdriver.DesiredCapabilities.CHROME.copy()
    capabilities['class'] = "org.openqa.selenium.Proxy"
    capabilities['platform'] = "WINDOWS"
    capabilities['version'] = "10"
    capabilities["autodetect"]= False
    
    driver = webdriver.Remote("http://192.168.43.98:5566/wd/hub", capabilities, browser_profile, proxy_object, keep_alive)
    driver.get('http://testing-ground.scraping.pro/recaptcha')
    raw_input('Press any key to quit Selenium driver: ')
    driver.quit()
    

    Yet, the opened browser instanse is not able to load any content...