I'm currently working with an API that generates CSV files as its output and the only way to retrieve them is to run a request.get
, such as:
raw_report_data = requests.get(report_url).content.decode('utf-8')
Then we upload these files to a GCP Cloud Storage, and we have multiple ways of doing that according to the GCP documentation.
I'd like to avoid downloading the whole report locally only to upload it to our GCP bucket. I'm aware that requests.get
allows a stream=True
argument, which downloads the content gradually, but I can't make it work with an "stream upload" to the Cloud Storage.
Here it is a code snippet for what I'm trying to do. I'm using a dummy CSV in order to simplify the API part, so we can focus on the problem
import requests
from google.cloud import storage
url = "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv"
# GCP info
client = storage.Client(project="my-project")
bucket = client.get_bucket('my-bucket')
target_blob = bucket.blob("test/report_01.csv")
with requests.get(url, stream=True) as f:
target_blob.upload_from_file(f)
For this code I get the following error..
AttributeError: 'Response' object has no attribute 'tell'
I think that I'm trying to join two incompatible things, but I'd appreciate any ideas, even if it's to tell me that this can't be done.
Extras:
file.read()
method, and, as far as I'm concerned, it reads the whole document before the upload. My desire is to upload the content while it's being downloaded, to avoid unnecessary use of local storage.You are getting that error because the object that you retrieved using the requests
library in Python does not have an attribute or method like tell()
.
Based on this documentation, you can either use response.text
to read the content response from the server. You can also use response.json()
if you are dealing with data in JSON format. If you want to get a raw stream of bytes of your data, use response.raw
and set stream=True
at first when making the request.
Since you are working with stream upload using upload_from_file, you can try using response.raw
in your code. Here’s an example:
import requests
from google.cloud import storage
url = "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv"
# GCP info
client = storage.Client(project="my-project")
bucket = client.get_bucket('my-bucket')
target_blob = bucket.blob("test/report_01.csv")
with requests.get(url, stream=True) as f:
target_blob.upload_from_file(f.raw)