I am new to Python and I am trying to create a Flask web service.
The service I've created is working fine for serial requests but when multiple requests come, it shows some weird behavior.
In my actual application what I observed is when multiple requests come frequently (one closely followed by another), the other calls wait for the first call to complete and then they execute in parallel.
I tried to reproduce this behavior and I got some success.
Here's the sample code:
from flask import Flask
from waitress import serve
import datetime
import pandas as pd
app = Flask(__name__)
@app.route("/get_data/")
def get_data():
print("Request received at: " + str(datetime.datetime.now()))
file_path = "D:\15MB_Data.csv"
t1 = datetime.datetime.now()
data_frame = pd.read_csv(file_path, ",")
print("Starting processing after reading file at: " + str(datetime.datetime.now()))
doSomeHeavyWork()
t2 = datetime.datetime.now()
init_time = 'Time in Initialization : ' + str((t2 - t1)) + ". Completed at " + str(datetime.datetime.now())
print(init_time)
return init_time
def doSomeHeavyWork():
current_time_plus5 = datetime.datetime.now() + datetime.timedelta(0, 5)
while datetime.datetime.now() < current_time_plus5:
i = 0
return 1
serve(app, host="0.0.0.0", port=5002)
To call the get_data function I use:
http://127.0.0.1:5002/get_data
And the output of this sample code is:
Request received at: 2021-11-10 15:26:32.482249
Starting processing after reading file at: 2021-11-10 15:26:32.875022
Request received at: 2021-11-10 15:26:33.112884
Request received at: 2021-11-10 15:26:33.485669
Request received at: 2021-11-10 15:26:33.804485
Starting processing after reading file at: 2021-11-10 15:26:36.032274
Starting processing after reading file at: 2021-11-10 15:26:36.438055
Starting processing after reading file at: 2021-11-10 15:26:37.089661
Time in Initialization : 0:00:05.444947. Completed at 2021-11-10 15:26:37.927196
Time in Initialization : 0:00:07.936492. Completed at 2021-11-10 15:26:41.049376
Time in Initialization : 0:00:08.023461. Completed at 2021-11-10 15:26:41.509130
Time in Initialization : 0:00:08.302279. Completed at 2021-11-10 15:26:42.120756
From the output we see that the second request was received at 2021-11-10 15:26:33.112884, but it started processing at 2021-11-10 15:26:36.032274 (waiting time 3 secs), and the other calls were also waiting till then.
I am using PyCharm on Windows 10 for development purposes. I thought it could be due to the development environment so I tried hosting the application on Ubuntu also using NGINX, but still no luck.
What should I do to execute new requests without waiting for the previous ones to complete?
waitress
runs requests on a thread pool, defaulting to 4 threads. Since the logs show 4 requests, this isn't the limit.
doSomeHeavyWork
implements a busy wait with no i/o or sleeps. Due to the GIL, Python bytecode execution blocks even on different threads (vs. i/o or most low-level libraries). If this were instead doing i/o or calling pandas
/numpy
type operations (many, not all of which release the GIL), rather than busy-waiting, the block time would be much less.