I am trying to scrape some data from
http://www.pogdesign.co.uk/cat/
.
I want to get the channel and the air-time of each program, but the problem is that by default they do not appear. Only after manually configuring the settings and saving them, the channel and the air-time of each program appear.
As I understand after inspecting the 'Network' section in the Chrome's developer tools, what actually happens after I click 'Save Settings' is that a POST request is being sent, with the relevant data parameters (e.g. 's_networks':'on'
and etc'), then a GET request is being sent, to retrieve the html file with channel and the air-time displayed.
I tried to emulate this process (POST request then GET request) using both
the python's requests
package, and the mechanicalsoup
package.
requests:
s = requests.Session()
s.post('http://www.pogdesign.co.uk/cat/', data = {'s_networks':'on'})
s.get('http://www.pogdesign.co.uk/cat/')
mechanicalsoup:
mcs = mechanicalsoup.Browser()
res_post = mcs.post('http://www.pogdesign.co.uk/cat/', data {'s_networks':'on'})
res_get = mcs.get('http://www.pogdesign.co.uk/cat/')
Yet the response I receive does not contain the channel and the air-time data.
The only difference I noticed is that the status code returned from the browser's POST request is 302
, and the returned status code from my python requests is 200
.
It is because of cookie which stores the user info, you can try the following code
import requests
s = requests.Session()
data = {
"style": 3,
"timezone": "GMT",
"s_numbers": "on",
"s_epnames": "on",
"s_airtimes": "on",
"s_popups": "on",
"s_wunwatched": "on",
"s_sortbyname": "on",
"s_weekstyle": "on",
"s_24hr": "on",
"settings": None
}
cookies = { # you can get the cookie info from dev tool
"CAT_UID":'' ,
"PHPSESSID":'' ,
"_ga": '',
"_gid": '',
"_gat": ""
}
post = s.post('http://www.pogdesign.co.uk/cat/', data=data, cookies=cookies)
text = post.text
get = s.get('http://www.pogdesign.co.uk/cat/', cookies=cookies)
text1 = get.text