google-cloud-dataflowapache-beam

Vertical autoscaling dataflow experiments args don't get properly parsed


We want to enable vertical autoscaling on our dataflow prime pipeline for a python container: https://cloud.google.com/dataflow/docs/vertical-autoscaling

We're trying to run our pipeline through this command:

gcloud dataflow jobs run process-data --additional-experiments=enable_prime --additional_experiments=enable_batch_vmr --additional_experiments=enable_batch_vmr ...

However, when we run this our pipeline is not a dataflow prime pipeline (we see this because the cost panel is enabled which it shouldn't if it was dataflow prime).

We see that all our experiment flags are fused together into one string and do not enable autoscaling: ['enable_prime,enable_batch_vmr,enable_vertical_memory_autoscaling', 'beam_fn_api', 'use_unified_worker', 'use_runner_v2', 'use_portable_job_submission', 'use_multiple_sdk_containers']

dataflow capture of the experiments

Has anyone been able to someone to run the command with additional experiments.


Solution

  • Actually, if we look closely at the doc that @jggp1094 linked

    We can notice that these flags are defined to be sent under the '--experiments' flag (NOT --additional-experiments) These flags are passed to the PipelineOptions as part of the beam args, and should be parsed correctly in this manner .

    doc screenshot