I'm starting to learn how to use the python requests module. For practicing I tried to manage a challenge/response problem: I want to access the data on http://lema.rae.es/drae/srv/search?val=hacer
With the "Tamper Data" plugin for Firefox I inspected the necessary HTTP requests:
GET http://lema.rae.es/drae/srv/search?val=hacer
POST http://lema.rae.es/drae/srv/search?val=hacer
I copied the exact headers that are sent by Firefox in the two HTTP requests and implemented the JavaScript "challenge" function in Python. Then I'm doing the following:
url = "http://lema.rae.es/drae/srv/search?val=hacer"
headers = { ... }
r1 = requests.get(url=url, headers=headers)
html = r1.content.decode("utf-8")
formdata = challenge(html)
headers = { ... }
r2 = requests.post(url=url, data=formdata, headers=headers)
Unfortunately, the server will not answer in the expected way. I checked all the headers I'm sending via "r.request.headers" and they agree perfectly with the headers that firefox sends (according to Tamper Data)
What am I doing wrong?
You can inspect my full code here: http://pastebin.com/7JAZ9B4s
This is the response header I should be getting:
Date[Tue, 10 Feb 2015 17:13:53 GMT]
Vary[Accept-Encoding]
Content-Encoding[gzip]
Cache-Control[max-age=0, no-cache]
Keep-Alive[timeout=5, max=100]
Connection[Keep-Alive]
Content-Type[text/html; charset=UTF-8]
Set-Cookie[TS014dfc77=017ccc203c29467c4d9b347fb56ea0e89a7182e52b9d7b4a1174efbf134768569a005c7c85; Path=/]
Transfer-Encoding[chunked]
And this is the response header I really get:
Content-Length[5798]
Content-Type[text/html]
Pragma[no-cache]
Cache-Control[no-cache]
I found the reason why my code doesn't work:
The server expects the POSTDATA in exactly the same order in which the entries appear as input-elements of the form. In my code the values of the input-elements were stored in a python dict. But this data type does not preserve the order in which values have been declared!
The ruby script (referred to in the comments) however does work because the ruby dict data type seems to preserve the order of declaration!
Furthermore, reimplementing the javascript challenge() function in python was not necessary at all, because the server will be happy to accept any response string (that worked in the past) over and over again!