google-cloud-dataflowgoogle-cloud-bigtable

Specifying --diskSizeGb when running a dataflow template


I'm trying to use a Google dataflow template to export data from Bigtable to Google Cloud Storage (GCS). I'm following the gcloud command details here. However, when running I get a warning and associated error where the suggested fix is to add workers (--numWorkers), increase the attached disk size (--diskSizeGb). However, I see no way to execute the Google provided template while passing those parameters. Amy I missing something?

Reviewing a separate question, it seems like there is a way to do this. Can someone explain how?


Solution

  • parameters like numWorkers and diskSizeGb are Dataflow wide pipeline options. You should be able to specify them like so

    gcloud dataflow jobs run JOB_NAME \
    --gcs-location LOCATION --num-workers=$NUM_WORKERS --diskSizeGb=$DISK_SIZE
    

    Let me know if you have furthr questions