google-cloud-platformgoogle-cloud-storagegoogle-cloud-data-transfer

Google CLoud Transfer Job is creating one extra folder


I have created a Transfer Job to import some of my website's static resources to Google storage. The job was supposed to import the data in a bucket named www.pretty-story.com.

It is importing from a tsv file located here.

For instance the first url is : https://www.pretty-story.com/wp-includes/js/jquery/jquery.min.js

so I would have expected the job to create the folder structure starting with wp-includes.

But instead the job created this folder structure www.pretty-story.com\wp-includes\js\jquery.

Therefore the complete path (including my bucket name) is : www.pretty-story.com\www.pretty-story.com\wp-includes\js\jquery.

How can I tell the data transfer job to use the bucket as first folder, instead of creating a subfolder with the same name ?


Solution

  • According to https://cloud.google.com/storage-transfer/docs/create-url-list:

    When an object located at http(s)://[HOSTNAME]:[PORT]/[URL_PATH] is transferred to Cloud Storage, the name of the object in Cloud Storage is [HOSTNAME]/[URL_PATH].

    You don't have an option to skip the [HOSTNAME]/ part of this, so what you are asking is not possible.

    If the amount of data involved is reasonable, I recommend downloading it to a workstation and using gsutil to copy it into a bucket without the hostname prefix.