I am trying to complete a form submission on a webpage (http://supermag.jhuapl.edu/mag/?) using MechanicalSoup. Prior to the submission a date must be specified, within the same form, using drop down boxes for start day, month, year, time etc. This can be done with the set_select()
MechanicalSoup function, but I cannot seem to access the relevant select
tag for each field. A small disclaimer; while I have scientific programming experience I am new to HTML and the Python libraries mentioned above.
While I am unsure which library is best to use for selecting the date, I cannot seem to access the relevant select
tag that is a child element of corresponding span
tags within the form, with name attributes such as 'start_day', 'start_month'.
I have both the mechanicalsoup.Form(form)
and mechanicalsoup.StatefulBrowser(*args, **kwargs)
objects (the latter corresponding to a bs4.BeautifulSoup
object) and have tried:
select
tags with MechanicalSoup's set_select
span
tag and using BeautifulSoup to access the elements below (in particular the select
tags), with the aim of somehow then choosing the value by changing the URL (?) A snippet of the relevant HTML is shown; note the div
tags and subsequent select
tags as children.
The form tag:
<form name="theForm" class="form-horizontal" onsubmit="return false;">
The relevant span and select tags within form:
<span name="start_time">
<div>
<select name="start_day">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>...
</select>
<select style="width: 4em;" name="start_month">
<option value="1">January</option>
<option...
</select>
</div>
</span>
Code is found below:
# Opening browser and URL
url = "http://supermag.jhuapl.edu/mag/?"
browser = ms.StatefulBrowser()
browser.open(url)
# Assigning bs4.BeautifulSoup object
html = browser.get_current_page()
# Assigning relevant form
form = browser.select_form('form[name="theForm"]')
# Assign correct span tag for e.g start_time
start_time_span = html.find_all('span')[2]
# Attempt to set start day value - returns
# 'InvalidFormMethod: No select named start_day'
form.set_select({'start_day': 1})
# Attempt to find select tags with bs4
html.find('select', {'start_day': 1})
start_time_span.find('select', {'start_day': 1})
# and eg looking for contents returns empty list
start_time_span.contents
I expected to have the select
tags listed within the bs4 find()
attempts, or for the mechanicalsoup set_select()
to access and set the given select
tag when called on the correct form.
The span
tag is found within the BeautifulSoup HTML, but does not seem to have any child select
tags that are present within the source HTML, and are necessary for selecting the date. Calling set_select()
returns an error saying that the tag cannot be found.
Thank you in advance; this is my first question on StackOverflow and I hope it meets the guidelines sufficiently well!
To me, your code generally looks fine! When I run your python snippet on the HTML you quote here, it does not raise an InvalidFormMethod
exception. However, when I run it on the URL you provided, I do see that error (because, looking at the source HTML, there is no element with the name start_day
).
I suspect this is because a specific JavaScript action is generating the HTML that includes a start_day
field. This is hinted at by the form having an onsubmit
attribute and no action
, as well as including a lot of JavaScript files (which may or may not be necessary to interact with the form). Depending on what exactly you want to do with this form, you probably need to use a tool that supports JavaScript, like Selenium (MechanicalSoup does not -- see this FAQ).