pythonflaskscreen-scrapingmechanizecookiejar

Unable to execute python web scraping script successfully after user submits a form on a website built with Flask from the second time onwards


Using Flask and Python, I have a website running on localhost which allows user to select a specific month to download a report for. Based on the selected month, I will than have my web scraping file imported which retrieves the data from another website (requires login). My web scraping script uses Mechanize.

Here is the portion of code where my web scraping file (webscrape.py) is imported after the download button is clicked (the selection is done on office.html):

@app.route('/office/', methods=['GET','POST'])
def office():
    form=reportDownload()
    if request.method=='POST':
        import webscrape
        return render_template('office.html', success=True)
    elif request.method=='GET':
        return render_template('office.html', form=form)

In the render_template method,success=True is passed as an argument so that my office.html script will display a success message, if not (when it is a GET request), it will display the form for user selection. Here is my script for office.html:

{% extends "layout.html" %}
{% block content %}
  <h2>Office</h2>
  {% if success %}
    <p>Report was downloaded successfully!</p>
  {% else %}
    <form action="{{ url_for('office') }}" method="POST">
      <table width="70%" align="center" cellpadding="20">
      <tr>
        <td align="right"><p>Download report for: </p></td>
        <td align="center"><p>Location</p>
                  {{form.location}}</td>
        <td align="center"><p>Month</p> 
                             {{form.month}}  </td>
        <td align="center"><p>Year</p>   
                             {{form.year}}  </td>
      </tr>
      <tr>
        <td></td>
        <td></td>
        <td></td>
        <td align="center">{{form.submit}} </td>
      </tr>
    </table>
   </form>
   {% endif %}
{% endblock %}

The problem I have is when I want to do further downloads, i.e. after downloading for the first time, I go back to the office page and download a report again. On the second try, the success message gets displayed but nothing gets downloaded.

In my web scraping script, using mechanize and cookiejar, I have this few lines of code in the beginning:

  br = mechanize.Browser()
  cj = cookielib.LWPCookieJar()
  br.set_cookiejar(cj)

and I proceed with the web scraping.

When running the web scraping file on my Terminal (or command prompt), the script executes without any problems, even if I run it a second or third time. So I think that it may be a problem with the website codes.

Any suggestions will be appreciated! I have tried different ways of resolving the problem such as using return redirect instead, or trying to clear the cookies in cookiejar. None has worked so far, or I may be using the methods wrongly.

Thank you in advance!


Solution

  • Once your Flask app is started it only imports each package once. That means that when it runs into import webscrape for the second time it says “well, I already imported that earlier, so no need to take further action…” and moves on to the next line, rendering the template without actually starting the script.

    In that sense import in Python is not the same as require for other languages (such as PHP; by the way, it would be closer to require_once in PHP).

    The solution would be to make your scraper an object (class) and instantiate it each time you need it. Then you move the import to the top of the file and inside the if request.method=='POST' you just create a new instance of your web scraper.