gcloudgoogle-cloud-bigtablebigtablesequencefile

Migrating a huge Bigtable database in GCP from one account to another using DataFlow


I have a huge database stored in Bigtable in GCP. I am migrating the bigtable data from one account to another GCP Account using DataFlow. but, when I created a job to create a sequence file from the bigtable it has created 3000 sequence files on the destination bucket. so, it is not possible to create a single dataflow for each 3000 sequence file so, Is there any way to reduce the sequence files or a way to provide the whole 3000 sequence files at once in a Data Flow Job template in GCP

We have two sequence file wanted to upload data sequentially one after another(10 rows and one column), but actually getting result uploaded(5 rows and 2 columns)


Solution

  • The sequence files should have some sort of pattern to their naming e.g. gs://mybucket/somefolder/output-1, gs://mybucket/somefolder/output-2, gs://mybucket/somefolder/output-3 etc.

    When running the Cloud Storage SequenceFile to Bigtable Dataflow template set the sourcePattern parameter to the prefix of that pattern like gs://mybucket/somefolder/output-* or gs://mybucket/somefolder/*