I would like to be able to limit the rate of a blob download from google cloud storage in Python.
I could not find any indication that is possible using the official Python library or the alternative GCSFS library.
My best guess so far would be to implement it by downloading slices of the blob using download_as_bytes() start
and end
arguments and control for timing between slice requests, but 1) I would prefer if possible a built-in solution and 2) I am not sure this would be the best solution.
Does anybody have a built-in solution or a better approach?
To limit the download rate of a blob in Google Cloud Storage using Python, there's no built-in solution. You can manually download the file in chunks and control the timing between downloads.
Here's a simple example:
import time
from google.cloud import storage
def download_blob_rate_limited(bucket_name, blob_name, dest_file, chunk_size=1024*1024, rate_limit=512*1024):
client = storage.Client()
blob = client.bucket(bucket_name).blob(blob_name)
with open(dest_file, 'wb') as file_obj:
start = 0
blob_size = blob.size
while start < blob_size:
end = min(start + chunk_size, blob_size)
chunk = blob.download_as_bytes(start=start, end=end - 1)
file_obj.write(chunk)
time.sleep(chunk_size / rate_limit)
start = end
# Usage example
download_blob_rate_limited('my-bucket', 'my-blob', 'local_file.txt', rate_limit=512*1024)
This downloads the file in chunks and limits the download rate using time.sleep().