pythonpython-2.7python-3.xweb-scrapingweb-scripting

Read page source before POST


I want to know if there is a way to POST parameters after reading the page source. Ex: read captcha before posting ID#

My current code:

import requests
id_number = "1"
url = "http://www.submitmyforum.com/page.php"
data = dict(id = id_number, name = 'Alex')
post = requests.post(url, data=data)

There is a captcha that is changeable after every request to http://submitforum.com/page.php (obv not a real site) I would like to read that parameter and submit it to the "data" variable.


Solution

  • As discussed in OP comments, selenium can be used, methods without browser emulation may also exists !

    Using Selenium (http://selenium-python.readthedocs.io/) instead of requests module method:

    import re
    import selenium
    from selenium import webdriver
    
    regexCaptcha = "k=.*&co="
    url = "http://submitforum.com/page.php"
    
    # Get to the URL
    browser = webdriver.Chrome()
    browser.get(url)
    
    # Example for getting page elements (using css seletors)
    # In this example, I'm getting the google recaptcha ID if present on the current page
    try:
        element = browser.find_element_by_css_selector('iframe[src*="https://www.google.com/recaptcha/api2/anchor?k"]')
        captchaID = re.findall(regexCaptcha, element.get_attribute("src"))[0].replace("k=", "").replace("&co=", "")
        captchaFound = True
        print "Captcha found !", captchaID
    except Exception, ex:
        print "No captcha found !"
        captchaFound = False
    
    # Treat captcha
    # --> Your treatment code
    
    # Enter Captcha Response on page
    captchResponse = browser.find_element_by_id('captcha-response')
    captchResponse.send_keys(captcha_answer)
    
    # Validate the form
    validateButton = browser.find_element_by_id('submitButton')
    validateButton.click()
    
    # --> Analysis of returned page if needed