seleniumweb-scrapingbeautifulsoupscrapymechanize

How to fill JavaScript form using Python?


I want to use Python to fill this form.

I tried using Mechanize but this is a Microsoft Form which uses JavaScript and has no form tag and no GET/POST URL. Maybe BeautifulSoup/Selenium can do this, but I do not have any experience in scraping JS forms. Can anyone help me out and suggest how to go about this?

Here's what I've tried, Mechanize is unable to recognize any form on the page:

import mechanize

def main():
    br = mechanize.Browser()
    br.set_handle_robots(False)
    br.set_handle_refresh(False)
    br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
    response  = br.open("https://forms.office.com/Pages/ResponsePage.aspx?id=8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u")
    for form in br.forms():
        print("Form name:", form.name) #prints nothing
        print(form) #prints nothing

if __name__ == '__main__':
    main()

Solution

  • Selenium works fine.

    You'll need to install the components

    Then this runs:

    from selenium import webdriver
    
    driver = webdriver.Chrome()
    url = "https://forms.office.com/Pages/ResponsePage.aspx?id=8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u"
    driver.get(url)
    
    
    name = driver.find_element_by_xpath("//div[@class='question-title-box'][.//span[text()='NAME']]/following-sibling::*//input")
    name.send_keys("hello, World")
    
    setionSelection = "F"
    section = driver.find_element_by_xpath("//div[@class='question-title-box'][.//span[text()='Section']]/following-sibling::*//input[@value='" + setionSelection + "']")
    section.click()
    
    date = driver.find_element_by_xpath("//input[contains(@placeholder, 'Please input date')]")
    date.send_keys("01/12/2020")
    
    
    submit = driver.find_element_by_xpath("//div[text()='Submit']")
    submit.click()
    

    The xapths are a little long but they're based on the question text so potentially stable

    Working selenium


    For an alternative approach - When you say there is no POST url, did you check devtools? - That exposes the destination of the form:

    Request URL: https://forms.office.com/formapi/api/aebbf9f0-23da-49e3-98bf-32171abbc9bc/users/f70e502c-96b2-4239-aca3-764dea371071/forms('8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u')/responses
    Request Method: POST
    

    it also exposes the payload... This is the first submit:

    {startDate: "2020-08-17T10:40:18.504Z", submitDate: "2020-08-17T10:40:18.507Z",…}
    answers: "[{"questionId":"r8f09d63e6f6f42feb2f8f4f8ed3f9389","answer1":"Hello, World"},{"questionId":"r28fe12073dfa47399f8ce95ae679dccf","answer1":"G"},{"questionId":"r8f9e9fedcc2e410c80bfa1e0e3ef9750","answer1":"2020-08-28"}]"
    startDate: "2020-08-17T10:40:18.504Z"
    submitDate: "2020-08-17T10:40:18.507Z"
    

    Those post URL UUID/GUIDs questions IDs seem to be satic for this form. Every time i run form they're not chaning. This is the second run:

    {startDate: "2020-08-17T10:43:48.544Z", submitDate: "2020-08-17T10:43:48.546Z",…}
    answers: "[{"questionId":"r8f09d63e6f6f42feb2f8f4f8ed3f9389","answer1":"test me"},{"questionId":"r28fe12073dfa47399f8ce95ae679dccf","answer1":"G"},{"questionId":"r8f9e9fedcc2e410c80bfa1e0e3ef9750","answer1":"2020-08-12"}]"
    startDate: "2020-08-17T10:43:48.544Z"
    submitDate: "2020-08-17T10:43:48.546Z"
    

    Once you capture this once you'll probably be able to do it through the API without a GUI.

    ... Just to make sure, i tried it and i get success...

    enter image description here

    import requests
    
    url = "https://forms.office.com/formapi/api/aebbf9f0-23da-49e3-98bf-32171abbc9bc/users/f70e502c-96b2-4239-aca3-764dea371071/forms('8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u')/responses"
    myobj = {"startDate":"2020-08-17T10:48:40.118Z","submitDate":"2020-08-17T10:48:40.121Z","answers":"[{\"questionId\":\"r8f09d63e6f6f42feb2f8f4f8ed3f9389\",\"answer1\":\"Hello again, World\"},{\"questionId\":\"r28fe12073dfa47399f8ce95ae679dccf\",\"answer1\":\"F\"},{\"questionId\":\"r8f9e9fedcc2e410c80bfa1e0e3ef9750\",\"answer1\":\"2020-08-26\"}]"}
    
    x = requests.post(url, data = myobj)
    

    My answers are just hard coded into the data object but it seems to work.

    Remember to pip install requests if you don't already have it