pythonurllib2urllibcookielib

urllib+cookielib packet manipulation in python


I'm working on a project that will access a specific site to do a search and then I will filter and return the value; the program logs in and then runs the search saving the cookie with a cookie jar to authenticate the connection while it runs the search . However when I run the program it returns no results and the packet header looks completely different. What am I doing wrong that the search always returns no results.

Here is my code:

import cookielib, urllib, urllib2

file= open('results.txt', 'wb')

cj=cookielib.CookieJar()

opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.addheaders=[('Referer', 'http:// site that runs the search/psc/p01ps1/EMPLOYEE/CRM/c/BANNER_TAP.SRCH_ATDO_TAP.GBL')]

opener.addheaders=[('User-Agent', 'Mozilla/5.0 (Windows NT 5.1; rv:22.0) Gecko/20100101 Firefox/22.0')]

posts={'timezoneOffset':'180', 'userid':'user', 'pwd':'password', 'Submit':'Signon'}

data = urllib.urlencode(posts)

opens=opener.open('loginpage.com', data)

print cj

file.write(opens.read())

cjs=str(cj)

posts2 = urllib.urlencode({'ICType':'Panel', 'ICElementNum':0, 'ICStateNum':1, 'ICAction':'SRCH_ATD_TAP_WK_SRCH_PB', 'ICXPos':0, 'ICYPos':0, 'ICFocus':'', 'ICChanged':1, 'ICResubmit':0, 'ICFind':'', 'SRCH_ATD_TAP_WK_MSISDN_TAP':'', 'SRCH_ATD_TAP_WK_CNPJ_TAP':'', 'SRCH_ATD_TAP_WK_STATUS_RA_TAP':'', 'SRCH_ATD_TAP_WK_INTERACTION_ID':'', 'SRCH_ATD_TAP_WK_CASE_ID':48373914, 'SRCH_ATD_TAP_WK_PROTOCOLO_TAP':'', 'SRCH_ATD_TAP_WK_DATA_INI_BAN_TAP':'', 'SRCH_ATD_TAP_WK_HORA_INI_RA_TAP':'', 'SRCH_ATD_TAP_WK_DATA_FIM_BAN_TAP':'', 'SRCH_ATD_TAP_WK_HORA_FIM_BAN_TAP':'', 'SRCH_ATD_TAP_WK_MOTIVO_ID1_TAP':0, 'SRCH_ATD_TAP_WK_MOTIVO_ID2_TAP':0, 'SRCH_ATD_TAP_WK_MOTIVO_ID3_TAP':0, 'SRCH_ATD_TAP_WK_MOTIVO_ID4_TAP':0, 'SRCH_ATD_TAP_WK_MOTIVO_ID5_TAP':0, 'SRCH_ATD_TAP_WK_COMPANY_TYPE_TAP':'','SRCH_ATD_TAP_WK_SUBTIPO_CLI_TAP':''})

url2='searchpage.com'

opens2 = opener.open(url2, posts2) 

str=opens2.read()

print cj

file.write(str + cjs)

file.close()

It connects the first time to the login page to save the cookie and then it connects to the the search page. Again this is just to be used on one site so the connections and post data are very specific.

Again, this code doesn't return any results (after searching the str var which is the entire unfiltered site.

Here are the results I get when scanning the the requests with wireshark, the first one is the site ran in firefox doing the search in a normal browser (including the post data sent) and the second one is my program running and automating the search for me.

POST /psc/p01ps1/EMPLOYEE/CRM/c/BANNER_TAP.SRCH_ATDO_TAP.GBL HTTP/1.1
Host: siteroot
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:22.0) Gecko/20100101 Firefox/22.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: site that runs the search/BANNER_TAP.SRCH_ATDO_TAP.GBL #note I wasn't able to create this header.
Cookie: SignOnDefault=my login id; PS_LOGINLIST=http:// siteroot; brux0128-claro-com-br-7090-PORTAL-PSJSESSIONID=dpLmTCpY8vTmj4nMHbpyptPMdvphpRLR!841308261; ExpirePage=http:// siteroot/psp/p01ps1/; PS_TOKEN=AAAAogECAwQAAQAAAAACvAAAAAAAAAAsAARTaGRyAgBOcQgAOAAuADEAMBSfJDUA/BR2T3ekF0/cVhdJ7uJlpgAAAGIABVNkYXRhVnicHYpBCoAgFESfFi2jixRqYrgO2hbWvjN0vw7X5B94bxg+8BjbtBh09v05kJlxpGq1joOd0ksnGxc3KyUS9OSJjHIQPUtlYNLqK52Ya5Li+ABuIwtr; http%3a%2f%2fsiteroot%2fpsp%2fp01ps1%2femployee%2fcrm%2frefresh=list:||||||; PS_360=PS_360_BO_ID_CUST!0!PS_360_CUST_SETID!!PS_360_BO_ID_CONT!0!PS_360_BO_ID_SITE!0!PS_360_CUST_ROLE!0!PS_360_CONT_ROLE!0!PS_360_BO_ID!0!PS_360_VIEW_OPTION!False; PS_TOKENEXPIRE=18_Feb_2014_00:04:41_GMT; HPTabName=DEFAULT
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 683

POST DATA: ICType=Panel&ICElementNum=0&ICStateNum=17&ICAction=SRCH_ATD_TAP_WK_SRCH_PB&ICXPos=0&ICYPos=84&ICFocus=&ICChanged=1&ICResubmit=0&ICFind=&SRCH_ATD_TAP_WK_MSISDN_TAP=&SRCH_ATD_TAP_WK_CNPJ_TAP=&SRCH_ATD_TAP_WK_STATUS_RA_TAP=&SRCH_ATD_TAP_WK_INTERACTION_ID=&SRCH_ATD_TAP_WK_CASE_ID=48373914&SRCH_ATD_TAP_WK_PROTOCOLO_TAP=&SRCH_ATD_TAP_WK_DATA_INI_BAN_TAP=&SRCH_ATD_TAP_WK_HORA_INI_RA_TAP=&SRCH_ATD_TAP_WK_DATA_FIM_BAN_TAP=&SRCH_ATD_TAP_WK_HORA_FIM_BAN_TAP=&SRCH_ATD_TAP_WK_MOTIVO_ID1_TAP=0&SRCH_ATD_TAP_WK_MOTIVO_ID2_TAP=0&SRCH_ATD_TAP_WK_MOTIVO_ID3_TAP=0&SRCH_ATD_TAP_WK_MOTIVO_ID4_TAP=0&SRCH_ATD_TAP_WK_MOTIVO_ID5_TAP=0&SRCH_ATD_TAP_WK_COMPANY_TYPE_TAP=&SRCH_ATD_TAP_WK_SUBTIPO_CLI_TAP=



POST /psc/p01ps1/EMPLOYEE/CRM/c/BANNER_TAP.SRCH_ATDO_TAP.GBL HTTP/1.1
Accept-Encoding: identity
Content-Length: 681
Host: siteroot
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:22.0) Gecko/20100101 Firefox/22.0
Connection: close
Cookie: PS_TOKEN=AAAAogECAwQAAQAAAAACvAAAAAAAAAAsAARTaGRyAgBOcQgAOAAuADEAMBSX+ZILWKx7oU/VKvJbVT8LbueJtwAAAGIABVNkYXRhVnicJYpLCoAwDAWnVVyKF1Hsh2rXgluluvcM3s/DGWNCZh6PALexVY1Bxj4fOzKBkaSW1LCzUVrRwcrJxUKJeHlyRHqxFzomZWCQZlYm5b9Z7gVtawtT; ExpirePage=siteroot; PS_LOGINLIST=siteroot; PS_TOKENEXPIRE=18_Feb_2014_00:08:09_GMT; brux0128-claro-com-br-7090-PORTAL-PSJSESSIONID=QG14TCkJK7PpfRtNH0CSCw9S1m6jtRR9!841308261; SignOnDefault=my login id; http%3a%2f%2fsiteroot%2fpsp%2fp01ps1%2femployee%2fcrm%2frefresh=list:
Content-Type: application/x-www-form-urlencoded

POST DATA: SRCH_ATD_TAP_WK_DATA_INI_BAN_TAP=&SRCH_ATD_TAP_WK_MOTIVO_ID4_TAP=0&ICResubmit=0&ICXPos=0&SRCH_ATD_TAP_WK_DATA_FIM_BAN_TAP=&SRCH_ATD_TAP_WK_PROTOCOLO_TAP=&SRCH_ATD_TAP_WK_SUBTIPO_CLI_TAP=&SRCH_ATD_TAP_WK_MOTIVO_ID3_TAP=0&ICAction=SRCH_ATD_TAP_WK_SRCH_PB&SRCH_ATD_TAP_WK_MOTIVO_ID5_TAP=0&ICElementNum=0&SRCH_ATD_TAP_WK_INTERACTION_ID=&ICType=Panel&SRCH_ATD_TAP_WK_STATUS_RA_TAP=&SRCH_ATD_TAP_WK_COMPANY_TYPE_TAP=&SRCH_ATD_TAP_WK_HORA_FIM_BAN_TAP=&SRCH_ATD_TAP_WK_MOTIVO_ID2_TAP=0&ICFind=&SRCH_ATD_TAP_WK_MOTIVO_ID1_TAP=0&SRCH_ATD_TAP_WK_HORA_INI_RA_TAP=&ICChanged=1&ICStateNum=1&ICYPos=0&ICFocus=&SRCH_ATD_TAP_WK_CASE_ID=48373914&SRCH_ATD_TAP_WK_MSISDN_TAP=&SRCH_ATD_TAP_WK_CNPJ_TAP=

(This is for personal use at the company I work at to make this task more simple which needs to be done around 500 times at this point manualy. it is a site that registers protocols and we need to search the protocols to check if (later will import a list from excel) the protocol is closed of not)

note that I don't have the additional headers but if that could solve the problem I can. And for some reason my post data gets all disorganized ( but from what I understand about post data that shouldn't make a difference) and the cookie information is also somewhhat backwards, but that also shouldn't matter I would assum because to retrieve the cookie info is handled much like a python dictionary.

so I've been breaking my head over this little code and rewritting it several times for the past two weeks and I still can't get it to return the search results. it's also important to note that I won't be able to install the browser core to be able to execute the javascript, but I also don't think that it's necessary do to the fact that the results from the search done on firefox show in wireshark, so the site is downloaded with the result. I was able to get mechanize running, but I havn't been able to try it yet. If there is a way to automate firefox (I don't remember which version at this moment) with python, that is an option that I'm open to. One ore thing, because I'm working on this project at work, I'm not able to use and python plugin that has to be installed. I got mechanize to work because I open and copied the file over, with out running the setup.py. So just to make things easier, I have no way to install libraries.


Solution

  • You don't have PS_360 set in your cookie. Not sure how essential this is, but the best strategy going through these issues is to get step by step identical requests. Probably the first request to get ỳour cookie set was already different, or your browser has cookie data from previous requests, that you need to create manually for your request.