pythonweb-scraping

Problem using the submit method in scrape.py library


I'm using the scrape.py library to scrape a website. (library and documentation can be found here http://zesty.ca/scrape/)

There is a a button on the page I want the session to press, but I don't understand exactly how to use the submit function. As I understand I am supposed to give it a region object of a form. The button itself is an input html element. I tried giving it both the form and input, and I get the same error every time.

My code (on google app engine):

s.go(url)
form = s.doc.first(name="form1")
s.submit(region=form)

or

s.go(url)
input = s.doc.first(tagname="input", id="blabla")
s.submit(region=input)

and the error:

ERROR    2011-05-01 23:37:18,673 __init__.py:427] sequence item 0: expected string, NoneType found
Traceback (most recent call last):
  File "\appengine\ext\webapp\__init__.py", line 636, in __call__
    handler.post(*groups)
  File "main.py", line 135, in post
    s.submit(region=form)
  File "scrape.py", line 342, in submit
    return self.go(url, p, redirects)
  File "scrape.py", line 288, in go
    self.cookiejar)
  File "scrape.py", line 176, in fetch
    data = urlencode(data)
  File "scrape.py", line 409, in urlencode
    for key, value in params.items()]
  File "scrape.py", line 405, in urlquote
    return ''.join(map(urlquoted.get, text))
TypeError: sequence item 0: expected string, NoneType found

Solution

  • My assupmtion is that it's probably because the button and the form were covered in javascript, so scrape probably couldn't work with that. Need libraries that support JS, like selenium or windmill.