azuregoogle-cloud-platformgoogle-cloud-storageazure-data-lake-gen2

How to move/copy files from ADLS Containers to GCS Bucket?


I have some files in ADLS Containers and I need to copy or move those files to GCS Bucket in certain interval.

Can someone suggest what are the current option we have to achieve this?


Solution

  • Doing a fast research in GCP website I found out that the native tool offered by Google for these task is Storage Transfer Service [1].

    It says that depending on your source type, you can easily create and run Google-managed transfers, or configure self-hosted transfers that give you full control over network routing and bandwidth usage.

    It is specified that ADLS sources are supported [2].

    You can configure access to source data in Microsoft Azure Storage using shared access signatures (SAS) [3]. In particular you will need to create an SAS token at the container level. See Grant limited access to Azure Storage resources using shared access signatures for instructions [4].

    You can save your Azure SAS token in Google Secret Manager [5].

    The acces to GCS will be managed by creating Storage Transfer Service with a Service Account which has the rights to STS jobs [6], permissions to use the Secret Manager API and of course access to the GCS destination.

    This would be the native option. Otherwise you would have also some third party tools like Apache NiFi, in cases like this one you will need an access token from both source and destionation storage system.

    [1] STS https://cloud.google.com/storage-transfer/docs/overview#:~:text=Storage%20Transfer%20Service%20automates,to%20write%20any%20code.

    [2] Supported sources https://cloud.google.com/storage-transfer/docs/sources-and-sinks#:~:text=Azure%20Blob%20Storage%2C%20including%20Azure%20Data%20Lake%20Storage%20Gen2

    [3] Access https://cloud.google.com/storage-transfer/docs/source-microsoft-azure#:~:text=You%20can%20configure%20access%20to%20source%20data%20in%20Microsoft%20Azure%20Storage%20using%20shared%20access%20signatures%20(SAS).

    [4] Azure Limited Access to storage with SAS https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview

    [5] Secret Manager https://cloud.google.com/storage-transfer/docs/source-microsoft-azure#secret_manager

    [6] STS job permissions https://cloud.google.com/storage-transfer/docs/access-control#jobs