pythonflaskurlretrieve

flask urlretrieve transaction isolation


I'm using flask to process requests which contain an URL pointing to a document. When a request arrives, the document the URL points to is saved to a file. The file is opened, processed and a json string depending on the data in the document is generated. The json string is sent in the response.

My Question is about requests which arrive with very short time between them. When User1 sends url_1 in his request the document at url_1 is saved. User2 sends a request with url_2 before the document from User1 is opened. Will the generated json string which is sent to User1 be based on the document at url_2? Is this very likely to happen?

The following picture illustrates the scenario:

transaction isolation

Here is what the flask app looks like:

app = Flask(__name__)

@app.route("/process_document", methods=['GET'])
def process_document():
    download_location = "document.txt"
    urllib.request.urlretrieve(request.args.get('document_location'),download_location)
    json = some_module.construct_json(download_location)
    return json

Solution

  • If threading is enabled (disabled by default) then the situation can happen. If you must use the local file system, then it's always advisable to isolate it, e.g. using a temporary directory. You can use tempfile.TemporaryDirectory for example for that.

    import os
    from tempfile import TemporaryDirectory
    
    # ...
    
    @app.route("/process_document", methods=['GET'])
    def process_document():
        with TemporaryDirectory() as path:
            download_location = os.path.join(path, "document.txt")
            urllib.request.urlretrieve(
                request.args.get('document_location'),
                download_location
            )
            json = some_module.construct_json(download_location)
            return json
    

    Using a temporary directory or file helps to avoid concurrancy issues like you describe. But it also guards against issues where say your function throws an exception and keeps the file around (it may not guard agains serious crashes). You would then not accidentally pick up a file from a previous run.