I am currently building a basic webscraper that gets train ticket prices from National Rail using Python and MechanicalSoup.
I am trying to fill out a form using basic train data (start and end station, as well as a date and time) so then I will have access to ticket prices for a specific train journey.
Here is the code I have used to fill out the form
import requests
from bs4 import BeautifulSoup
from bs4 import BeautifulSoup as Soup
import mechanicalsoup
#Mechanical soup
browser = mechanicalsoup.StatefulBrowser()
browser.open("http://www.nationalrail.co.uk/")
#Find the correct form
trainForm = browser.select_form('form[action="http://ojp.nationalrail.co.uk/service/planjourney/plan"]')
#Basic parameters (start and end, and date and time)
browser["from.searchTerm"] = "Norwich"
browser["to.searchTerm"] = "London Liverpool Street"
browser["timeOfOutwardJourney.monthDay"] = "28/11/2018"
browser["timeOfOutwardJourney.hour"] = 13
browser["timeOfOutwardJourney.minute"] = 15
browser["_checkbox"] = "off"
#Submit the form
browser.launch_browser()
response = browser.submit_selected()
#print the response
print(response)
The problem I am having is that when the form submits it returns <Response [400]>
. Research has led me to believe that my form is incorrectly filled out. However, when browser.launch_browser()
is executed and my browser is opened all the fields seem like they are correctly filled out and if I press submit myself then form is submitted correctly and the correct page of ticket prices is opened.
Does anyone know what I am doing wrong?
it happen only in python3, the problem is requests
replacing space in redirect URL with %09
print(response.url)
# http://www.nationalrail.co.uk/times_fares/109179.aspx%09%09%09%09
you can patch it, go to line 114 of
python_dir\Lib\site-packages\requests\sessions.py
and replace
location = location.encode('latin1')
with
location = location.strip().encode('latin1')