pythonseleniumurllib2cookielib

Python login page with pop up windows


I want to access webpages and print the source codes with python, most of them require login at first place. I have similar problem before and I have solved it with the following code, because they are fix fields on the webpage for me to locate them. Recently, I need to access another page, but this time, there is pop-up login window and I can't use the same method to solve the problem.

I have tried to use Selenium module, but it will require to open up the browser and do the trick, just wondering if there is similar method to cookielib for the python run the code at the background without noticing the browser has been opened? Many thanks!

import cookielib
import urllib
import urllib2


# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]

# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)

# The action/ target from the form
authentication_url = 'https://ssl.reddit.com/post/login'

# Input parameters we are going to send
payload = {
  'op': 'login-main',
  'user': '<username>',
  'passwd': '<password>'
  }

# Use urllib to encode the payload
data = urllib.urlencode(payload)

# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)

# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()

enter image description here


Solution

  • You can use selenium with PhantomJS to have an headless browser. There is also Ghost.py that use WebKit to interpret the Javascript. This two projects help to interact with the js content of the webapps.

    But I notice that the pop-up is due to an HTTP authentification protocol, here it seems to be https://en.wikipedia.org/wiki/NT_LAN_Manager

    So you may want to take a look at this protocol and create a request based on that, instead of trying to put your logins in the pop-up.