
Automatically retrieving large files via public HTTP into Google Cloud Storage

For weather processing purpose, I am looking to retrieve automatically daily weather forecast data in Google Cloud Storage.

The files are available on public HTTP URL (, but they are very large (between 30 and 300 Megabytes). Size of files is the main issue.

After looking at previous stackoverflow topics, I have tried two unsuccessful methods:

1/ First attempt via urlfetch in Google App Engine

    from google.appengine.api import urlfetch

    url = ""
    result = urlfetch.fetch(url)

    [...] # Code to save in a Google Cloud Storage bucket

But I get the following error message on the urlfetch line :

DeadlineExceededError: Deadline exceeded while waiting for HTTP response from URL

2/ Second attempt via the Cloud Storage Transfert Service

According to the documentation, it is possible to retrieve HTTP Data into Cloud Storage directly via the Cloud Storage Transfert Service :

But it requires the size and md5 of the files before the download. This option cannot work in my case because the website does not provide those information.

3/ Any ideas ?

Do you see any solution to retrieve automatically large file on HTTP into my Cloud Storage bucket?


  • 3/ Workaround with a Compute Engine instance

    Since it was not possible to retrieve large files from external HTTP with App Engine or directly with Cloud Storage, I have used a workaround with an always-running Compute Engine instance.

    This instance regularly checks if new weather files are available, downloads them and uploads them to a Cloud Storage bucket.

    For scalability, maintenance and cost reasons, I would have prefered to use only serverless services, but hopefully :