google-cloud-platformgoogle-cloud-dataproctrinodataproc

Configure trino-jvm properties in GCP Dataproc on cluster create


I'm trying to configure trino-jvm properties while creating a Dataproc cluster. I'm following Google's documentation and am able to successfully create a cluster without any special JVM configuration, but am receiving an error when attempting to configure JVM properties.

Here's the CLI command that running:

gcloud dataproc clusters create test-dataproc-cluster \
    --project=MY_PROJECT \
    --optional-components=TRINO \
    --region=region \
    --enable-component-gateway \
    --region=us-central1 \
    --image-version=2.1 \
    --properties="trino-jvm:XX:+HeapDumpOnOutOfMemoryError"

Here's the error that I receive:

ERROR: (gcloud.dataproc.clusters.create) argument --properties: Bad syntax for dict arg: [trino-jvm:XX:+HeapDumpOnOutOfMemoryError]. Please see `gcloud topic flags-file` or `gcloud topic escaping` for information on providing list or dictionary flag values with special characters.

It looks like Dataproc expects the value to the --properties argument to be in dictionary form, i.e. --properties=TYPE:KEY=VALUE. I'm able to successfully configure other properties that have a Key/Value syntax. However, I'm unable to configure JVM properties that do not follow that Key/Value form.

How can I configure trino-jvm properties in Dataproc?


Solution

  • You can use this --properties flag command flag to specify the Trino JVM property trino.jvm-extras=-XX:+HeapDumpOnOutOfMemoryError. The property is provided in the format key=value.

    --properties trino-env-config=trino.jvm-extras=-XX:+HeapDumpOnOutOfMemoryError
    

    Aside from --properties flag, one workaround to fix the error it by using the --metadata flag because it can accept multiple key-value pairs. It can provide the necessary JVM properties for Trino in Dataproc clusters during cluster creation.

    --metadata trino-jvm=XX:+HeapDumpOnOutOfMemoryError