There is a need to do a search on the website
url = r'http://www.cpso.on.ca/docsearch/'
this is an aspx page (I'm beginning this trek as of yesterday, sorry for noob questions)
using BeautifulSoup, I can get the __VIEWSTATE and __EVENTVALIDATION like this:
viewstate = soup.find('input', {'id' : '__VIEWSTATE'})['value']
eventval = soup.find('input', {'id' : '__EVENTVALIDATION'})['value']
and the header can be set like this:
headers = {'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'}
if you go to the webpage, the only values I really want to pass are the first name and last name...
LN = "smith"
FN = "a"
data = {"__VIEWSTATE":viewstate,"__EVENTVALIDATION":ev,
"ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtLastName":LN,
"ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtFirstName":FN}
so putting it all together its like this:
import urllib
import urllib2
import urlparse
import BeautifulSoup
url = r'http://www.cpso.on.ca/docsearch/'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup.BeautifulSoup(html)
viewstate = soup.find('input', {'id' : '__VIEWSTATE'})['value']
ev = soup.find('input', {'id' : '__EVENTVALIDATION'})['value']
headers = {'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'}
LN = "smith"
FN = "a"
data = {"__VIEWSTATE":viewstate,"__EVENTVALIDATION":ev,
"ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtLastName":LN,
"ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtFirstName":FN}
data = urllib.urlencode(data)
request = urllib2.Request(url,data,headers)
response = urllib2.urlopen(request)
newsoup = BeautifulSoup.BeautifulSoup(response)
for i in newsoup:
print i
The problem is that it doesnt really seem to give me the results... don't know if I need to supply every value for every textbox in the form or what... maybe I'm just not doing it properly. anyways, just hoping someone could set me straight. I thought I had it but i would expect to see a list of doctors and contact info.
any insight is much appreciated, I have used beautifulsoup before, but I think my problem is just sending Request and having the right amount of info in the data part.
Thanks!
took advice from @pguardiario and went the mechanize route... much simpler
import mechanize
url = r'http://www.cpso.on.ca/docsearch/'
request = mechanize.Request(url)
response = mechanize.urlopen(request)
forms = mechanize.ParseResponse(response, backwards_compat=False)
response.close()
form = forms[0]
form['ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtLastName']='Smith'
form['ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtPostalCode']='K1H'
print mechanize.urlopen(form.click()).read()
I am a long way from finishing, but this is getting me a lot further.