google-cloud-platformgoogle-bigquerygoogle-cloud-data-transfer

Max file count using big query data transfer job


I have about 54 000 files in my GCP bucket. When I try to schedule a big query data transfer job to move files from GCP bucket to big query, I am getting the following error:

Error code 9 : Transfer Run limits exceeded. Max size: 15.00 TB. Max file count: 10000. Found: size = 267065994 B (0.00 TB) ; file count = 54824.

I thought the max file count was 10 million.


Solution

  • I think that BigQuery transfer service lists all the files matching the wildcard and then use the list to load them. So it will be same that providing the full list to bq load ... therefore reaching the 10,000 URIs limit. This is probably necessary because BigQuery transfer service will skip already loaded files, so it needs to look them one by one to decide which to actually load.

    I think that your only option is to schedule a job yourself and load them directly into BigQuery. For example using Cloud Composer or writing a little cloud run service that can be invoked by Cloud Scheduler.