I am using the clusters.create API in Python to create clusters in Dataproc.
{
"projectId": "my-project-id",
"clusterName": "example-cluster",
"config": {
"configBucket": "",
"gceClusterConfig": {
"subnetworkUri": "default",
"zoneUri": "us-central1-b"
},
"masterConfig": {
"numInstances": 1,
"machineTypeUri": "n1-standard-4",
"diskConfig": {
"bootDiskSizeGb": 500,
"numLocalSsds": 0
}
},
"workerConfig": {
"numInstances": 2,
"machineTypeUri": "n1-standard-4",
"diskConfig": {
"bootDiskSizeGb": 500,
"numLocalSsds": 0
}
},
"initializationActions": [
{
"executableFile": "gs://cloud-example-bucket/my-init-action.sh"
}
]
}
}
In gcloud shell the connector version and the executable files are specified as:
gcloud dataproc clusters create <CLUSTER_NAME> \
--initialization-actions gs://dataproc-initialization-
actions/connectors/connectors.sh \
--metadata 'gcs-connector-version=1.7.0' \
--metadata 'bigquery-connector-version=0.11.0'
How to pass the connector versions (under metadata) to the API.
Running my script without passing the versions gives the following errors:
ERROR: None of connector versions are specified'
ERROR: None of connector versions are specified
+ exit 1
The metadata field can be specified under config/gceClusterConfig as follows:
'config': {
'gceClusterConfig': {
"metadata": {
"bigquery-connector-version": "0.12.1",
"gcs-connector-version": "1.8.1"
}
}
}