I am making this app using tkinter and requests which is supposed to be like a download manager. I am using requests and recently I found out about the stream keyword argument in the requests.get(url)
function to be able to write down the content while it is being downloaded. My problem is that when the user downloads multiple files or just big files requests just seems to stop. The weird part is that it does not raise an error like it is an expected behavior. Why does this happen? How can I resolve this? Simple version of the download without the GUI (I found out that it has a bit of a problem with this specific url):
import requests
import time
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
headers = requests.head(url, headers={'accept-encoding': ''}).headers
print(headers)
r = requests.get(url, allow_redirects=True, stream=True)
# headers = r.headers
name = url.split('/')[-1].split('.')[0]
print(name)
format_name = '.' + headers['Content-Type'].split('/')[1]
file_size = int(headers['Content-Length'])
downloaded = 0
print(name + format_name)
start = last_print = time.time()
with open(name + format_name, 'wb') as fp:
for chunk in r.iter_content(chunk_size=4096):
downloaded += fp.write(chunk)
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
UPDATE: I checked two other stackoverflow issues that could have an answer but apparently there questions were remained unanswered as well (link: Streaming download large file with python-requests interrupting, link: What exactly is Python's file.flush() doing?). I tried using both functions mentioned as a solution in the issues yet some of the downloads still stop. The new version of the code:
import requests
import time
import os
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
headers = requests.head(url, headers={'accept-encoding': ''}).headers
print(headers)
r = requests.get(url, allow_redirects=True, stream=True)
name = url.split('/')[-1].split('.')[0]
print(name)
format_name = '.' + headers['Content-Type'].split('/')[1]
file_size = int(headers['Content-Length'])
downloaded = 0
print(name + format_name)
start = last_print = time.time()
with open(name + format_name, 'wb') as fp:
for chunk in r.iter_content(chunk_size=4096):
downloaded += fp.write(chunk)
# Added the 'flush' and 'fsync' function as mentioned in the issues
fp.flush()
os.fsync(fp.fileno())
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
Even after adding these two functions, requests seems to stop. I have a suspicion that requests sometimes fails to keep the connection because in certain times of the day when my internet is not as strong, this problem occurs the most but again I don't understand why it does not raise an error like urllib. If this is not the case, then how can I solve this?
I made 3 changes, only one of which directly affects the outcome.
r.raise_for_status()
to check for any errors, just as a good practice.name = url.split('/')[-1].split('?')[0]
, which results in 'a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4'
, probably what you want since it has the proper extension.chunk_size
64-fold, which is probably what does the trick.import requests
import time
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
headers = requests.head(url, headers={'accept-encoding': ''}).headers
print(headers)
r = requests.get(url, allow_redirects=True, stream=True)
r.raise_for_status() # check for errors
# headers = r.headers
name = url.split('/')[-1].split('?')[0]
print(name)
file_size = int(headers['Content-Length'])
downloaded = 0
start = last_print = time.time()
with open(name, 'wb') as fp:
for chunk in r.iter_content(chunk_size=4096 * 64):
downloaded += fp.write(chunk)
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
Prints:
{'Accept-Ranges': 'bytes', 'Access-Control-Allow-Headers': '*', 'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'Server,range,Content-Length,Content-Range', 'Cache-Control': 'max-age=8640000', 'Content-Length': '101751914', 'Content-Type': 'video/mp4', 'Date': 'Sat, 21 Nov 2020 18:04:48 GMT', 'Etag': '"5e379fa7-6109c6a"', 'Expires': 'Mon, 01 Mar 2021 18:04:48 GMT', 'Last-Modified': 'Sun, 19 Nov 2000 08:52:00 GMT'}
a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4
Download 0 % done, avg speed 249 kbps
Download 3 % done, avg speed 1386 kbps
Download 11 % done, avg speed 3443 kbps
Download 19 % done, avg speed 4525 kbps
Download 28 % done, avg speed 5399 kbps
Download 38 % done, avg speed 6218 kbps
Download 50 % done, avg speed 6997 kbps
Download 63 % done, avg speed 7763 kbps
Download 78 % done, avg speed 8463 kbps
Download 89 % done, avg speed 8733 kbps
I should add that it also worked for me with the original chunk_size
of 4096, albeit much more slowly. In all honestly, I cannot give you a precise reason why it hanged for you, but there is certainly no reason not to try it with the larger (but not unreasonably large) chunk_size
that I suggest.
Update
I have tried running the code several times and have found the performance varying wildly. Despite the chunk_size
being specified, the code seems to end up iterating in much smaller chunks. Here is a sample run, bit it nevertheless completes:
{'Accept-Ranges': 'bytes', 'Access-Control-Allow-Headers': '*', 'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'Server,range,Content-Length,Content-Range', 'Cache-Control': 'max-age=8640000', 'Content-Length': '101751914', 'Content-Type': 'video/mp4', 'Date': 'Sat, 21 Nov 2020 19:14:13 GMT', 'Etag': '"5e379fa7-6109c6a"', 'Expires': 'Mon, 01 Mar 2021 19:14:13 GMT', 'Last-Modified': 'Sun, 19 Nov 2000 08:52:00 GMT'}
a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4
Download 0 % done, avg speed 243 kbps
Download 3 % done, avg speed 1240 kbps
Download 12 % done, avg speed 3803 kbps
Download 19 % done, avg speed 4484 kbps
Download 24 % done, avg speed 4601 kbps
Download 29 % done, avg speed 4615 kbps
Download 33 % done, avg speed 4503 kbps
Download 37 % done, avg speed 4411 kbps
Download 40 % done, avg speed 4126 kbps
Download 42 % done, avg speed 3907 kbps
Download 44 % done, avg speed 3674 kbps
Download 45 % done, avg speed 3462 kbps
Download 46 % done, avg speed 3238 kbps
Download 47 % done, avg speed 3061 kbps
Download 47 % done, avg speed 2913 kbps
Download 48 % done, avg speed 2753 kbps
Download 49 % done, avg speed 2613 kbps
Download 49 % done, avg speed 2504 kbps
Download 50 % done, avg speed 2396 kbps
Download 50 % done, avg speed 2286 kbps
Download 51 % done, avg speed 2190 kbps
Download 52 % done, avg speed 2108 kbps
Download 52 % done, avg speed 2035 kbps
Download 53 % done, avg speed 1975 kbps
Download 53 % done, avg speed 1907 kbps
Download 54 % done, avg speed 1859 kbps
Download 55 % done, avg speed 1831 kbps
Download 56 % done, avg speed 1796 kbps
Download 57 % done, avg speed 1759 kbps
Download 58 % done, avg speed 1724 kbps
Download 60 % done, avg speed 1693 kbps
Download 60 % done, avg speed 1663 kbps
Download 61 % done, avg speed 1633 kbps
Download 62 % done, avg speed 1605 kbps
Download 63 % done, avg speed 1580 kbps
Download 64 % done, avg speed 1555 kbps
Download 65 % done, avg speed 1536 kbps
Download 65 % done, avg speed 1515 kbps
Download 66 % done, avg speed 1496 kbps
Download 67 % done, avg speed 1476 kbps
Download 68 % done, avg speed 1456 kbps
Download 69 % done, avg speed 1438 kbps
Download 70 % done, avg speed 1421 kbps
Download 70 % done, avg speed 1405 kbps
Download 71 % done, avg speed 1391 kbps
Download 72 % done, avg speed 1372 kbps
Download 73 % done, avg speed 1357 kbps
Download 73 % done, avg speed 1344 kbps
Download 74 % done, avg speed 1330 kbps
Download 75 % done, avg speed 1320 kbps
Download 76 % done, avg speed 1310 kbps
Download 77 % done, avg speed 1297 kbps
Download 78 % done, avg speed 1289 kbps
Download 79 % done, avg speed 1284 kbps
Download 80 % done, avg speed 1279 kbps
Download 81 % done, avg speed 1275 kbps
Download 83 % done, avg speed 1272 kbps
Download 84 % done, avg speed 1271 kbps
Download 85 % done, avg speed 1270 kbps
Download 87 % done, avg speed 1269 kbps
Download 88 % done, avg speed 1265 kbps
Download 89 % done, avg speed 1260 kbps
Download 89 % done, avg speed 1252 kbps
Download 90 % done, avg speed 1244 kbps
Download 91 % done, avg speed 1237 kbps
Download 92 % done, avg speed 1230 kbps
Download 92 % done, avg speed 1224 kbps
Download 94 % done, avg speed 1214 kbps
Download 95 % done, avg speed 1204 kbps
Download 95 % done, avg speed 1195 kbps
Download 96 % done, avg speed 1186 kbps
Download 97 % done, avg speed 1177 kbps
Download 98 % done, avg speed 1168 kbps
Download 98 % done, avg speed 1160 kbps
Download 99 % done, avg speed 1151 kbps
Download 100 % done, avg speed 1144 kbps
Version Using urllib3
import urllib3
import time
http = urllib3.PoolManager()
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
r = http.request('HEAD', url)
headers = r.headers
print(headers)
r = http.request('GET', url, preload_content=False)
name = url.split('/')[-1].split('?')[0]
print(name)
file_size = int(headers['Content-Length'])
downloaded = 0
start = last_print = time.time()
with open(name, 'wb') as fp:
for chunk in r.stream(4096 * 64):
downloaded += fp.write(chunk)
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
r.release_conn()
Version Using urllib
import time
from urllib.request import urlopen
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
response = urlopen(url)
file_size = int(response.getheader('Content-Length'))
print('File size =', file_size)
name = url.split('/')[-1].split('?')[0]
print(name)
downloaded = 0
start = last_print = time.time()
with open(name, 'wb') as fp:
while True:
chunk = response.read(4096 * 64)
if not chunk:
break
downloaded += fp.write(chunk)
now = time.time()
if now - last_print >= 1:
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = time.time()
Version Using urllib
with urlretrieve
from urllib.request import urlretrieve
import time
def report_hook(numblocks, blocksize, file_size):
global start, last_print
now = time.time()
if now - last_print >= 1:
downloaded = numblocks * blocksize
pct_done = round(downloaded / file_size * 100)
speed = round(downloaded / (now - start) / 1024)
print(f"Download {pct_done} % done, avg speed {speed} kbps")
last_print = now
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
name = url.split('/')[-1].split('?')[0]
print(name)
start = time.time()
last_print = start
urlretrieve(url, name, report_hook)
Versions Using wget
wget
is very robust. If you are on Windows, you can download a version here. The first version gets the piped stderr output from wget
and displays each line, which looks like:
--2020-11-24 09:20:02-- https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY
Resolving aspb2.cdn.asset.aparat.com (aspb2.cdn.asset.aparat.com)... 91.229.46.35
Connecting to aspb2.cdn.asset.aparat.com (aspb2.cdn.asset.aparat.com)|91.229.46.35|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 101751914 (97M) [video/mp4]
Saving to: 'a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4'
0K ........ ........ ........ ........ ........ ........ 3% 509K 3m9s
3072K ........ ........ ........ ........ ........ ........ 6% 1.36M 2m5s
6144K ........ ........ ........ ........ ........ ........ 9% 1.94M 96s
9216K ........ ........ ........ ........ ........ ........ 12% 1.14M 88s
12288K ........ ........ ........ ........ ........ ........ 15% 931K 86s
15360K ........ ........ ........ ........ ........ ........ 18% 970K 83s
18432K ........ ........ ........ ........ ........ ........ 21% 1.28M 77s
21504K ........ ........ ........ ........ ........ ........ 24% 1.90M 69s
24576K ........ ........ ........ ........ ........ ........ 27% 2.64M 62s
27648K ........ ........ ........ ........ ........ ........ 30% 2.87M 56s
30720K ........ ........ ........ ........ ........ ........ 34% 2.07M 51s
33792K ........ ........ ........ ........ ........ ........ 37% 1.30M 49s
36864K ........ ........ ........ ........ ........ ........ 40% 713K 49s
39936K ........ ........ ........ ........ ........ ........ 43% 731K 49s
43008K ........ ........ ........ ........ ........ ........ 46% 663K 48s
46080K ........ ........ ........ ........ ........ ........ 49% 657K 48s
49152K ........ ........ ........ ........ ........ ........ 52% 1.01M 45s
52224K ........ ........ ........ ........ ........ ........ 55% 1.76M 41s
55296K ........ ........ ........ ........ ........ ........ 58% 1.49M 37s
58368K ........ ........ ........ ........ ........ ........ 61% 1.32M 34s
61440K ........ ........ ........ ........ ........ ........ 64% 1.20M 31s
64512K ........ ........ ........ ........ ........ ........ 68% 966K 29s
67584K ........ ........ ........ ........ ........ ........ 71% 977K 26s
70656K ........ ........ ........ ........ ........ ........ 74% 857K 24s
73728K ........ ........ ........ ........ ........ ........ 77% 803K 21s
76800K ........ ........ ........ ........ ........ ........ 80% 753K 19s
79872K ........ ........ ........ ........ ........ ........ 83% 842K 16s
82944K ........ ........ ........ ........ ........ ........ 86% 1.14M 13s
86016K ........ ........ ........ ........ ........ ........ 89% 1.79M 10s
89088K ........ ........ ........ ........ ........ ........ 92% 2.21M 7s
92160K ........ ........ ........ ........ ........ ........ 95% 2.19M 4s
95232K ........ ........ ........ ........ ........ ........ 98% 2.45M 1s
98304K ........ ........ 100% 2.54M=88s
2020-11-24 09:21:31 (1.10 MB/s) - 'a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4' saved [101751914/101751914]
The source:
import subprocess
import time
def run_wget(url, outfile):
cmd = ['wget', '--progress=dot:mega', '-O', outfile, url]
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
for stderr_line in iter(p.stderr.readline, ""):
yield stderr_line
p.stderr.close()
return_code = p.wait()
if return_code:
raise subprocess.CalledProcessError(return_code, cmd)
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
name = url.split('/')[-1].split('?')[0]
print(name)
start = time.time()
for line in run_wget(url, name):
print(line, end='')
print('Total time:', time.time() - start)
The second version, processes the output to produce a listing similar to the other solutions:
a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4
file size = 101751914
Download 3% done, avg speed 384 kbps
Download 6% done, avg speed 570 kbps
Download 9% done, avg speed 672 kbps
Download 12% done, avg speed 703 kbps
Download 15% done, avg speed 732 kbps
Download 18% done, avg speed 784 kbps
Download 21% done, avg speed 857 kbps
Download 24% done, avg speed 895 kbps
Download 27% done, avg speed 884 kbps
Download 30% done, avg speed 868 kbps
Download 34% done, avg speed 885 kbps
Download 37% done, avg speed 818 kbps
Download 40% done, avg speed 818 kbps
Download 43% done, avg speed 849 kbps
Download 46% done, avg speed 885 kbps
Download 49% done, avg speed 920 kbps
Download 52% done, avg speed 929 kbps
Download 55% done, avg speed 937 kbps
Download 58% done, avg speed 946 kbps
Download 61% done, avg speed 957 kbps
Download 64% done, avg speed 878 kbps
Download 68% done, avg speed 696 kbps
Download 71% done, avg speed 611 kbps
Download 74% done, avg speed 564 kbps
Download 77% done, avg speed 550 kbps
Download 78% done, avg speed 543 kbps
Download 80% done, avg speed 526 kbps
Download 83% done, avg speed 534 kbps
Download 86% done, avg speed 542 kbps
Download 89% done, avg speed 548 kbps
Download 92% done, avg speed 553 kbps
Download 95% done, avg speed 556 kbps
Download 98% done, avg speed 557 kbps
Download 100% done, avg speed 563 kbps
Total time: 176.51619601249695
The source:
import subprocess
import time
import re
def run_wget(url, outfile):
cmd = ['wget', '--progress=dot:mega', '-O', outfile, url]
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
for stderr_line in iter(p.stderr.readline, ""):
yield stderr_line
p.stderr.close()
return_code = p.wait()
if return_code:
raise subprocess.CalledProcessError(return_code, cmd)
url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
name = url.split('/')[-1].split('?')[0]
print(name)
file_size = None
start = time.time()
for line in run_wget(url, name):
if file_size is None:
m = re.match(r'Length: (\d+)', line)
if m:
file_size = int(m[1])
print('file size =', file_size)
else:
m = re.search(r'(\d+)%', line)
if m:
pct_done = int(m[1])
downloaded = file_size / 100 * pct_done
elapsed = time.time() - start
speed = round(downloaded / elapsed / 1024)
print(f"Download {pct_done}% done, avg speed {speed} kbps")
print('Total time:', time.time() - start)