According to the public documentation it is possible to run a Cloud Dataflow job in Shielded VMs on GCP.
For a non-templated job, like described in the Quick Start manual for Java that can be achieved by submitting the --dataflowServiceOptions=enable_secure_boot
flag as following:
mvn -Pdataflow-runner compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Djava.util.logging.config.file=logging.properties -Dexec.args="--project=${PROJECT_ID} \
--gcpTempLocation=gs://${BUCKET_NAME}/temp/ \
--output=gs://${BUCKET_NAME}/output \
--runner=DataflowRunner \
--region=${REGION} \
--dataflowServiceOptions=enable_secure_boot"
But when using a templated job, e.g. started using gcloud or Terraform:
gcloud dataflow jobs run word-count --gcs-location gs://dataflow-templates-europe-west3/latest/Word_Count --region ${REGION} --staging-location gs://${BUCKET_NAME}/temp --parameters inputFile=gs://${BUCKET_NAME}/sample.txt,output=gs://${BUCKET_NAME}/sample-output
The VM that gets started is not Shielded (when looking at its "Secure Boot" flag at runtime).
How can I run a templated Dataflow job in a Shielded VM on GCP?
To deploy the Dataflow job on shielded VMs, the additional-experiments
flag has to be set to enable_secure_boot
. I tested this out and was able to see that the secure boot was on during the job runtime.
gcloud dataflow jobs run word-count-on-shielded-vm-from-gcloud --project=project-id \
--gcs-location gs://dataflow-templates-europe-west3/latest/Word_Count \
--region us-central1 --staging-location gs://bucket-name/temp \
--parameters inputFile=gs://apache-beam-samples/shakespeare/kinglear.txt,output=gs://bucket-name/sample-output \
--additional-experiments=enable_secure_boot
By adding the additional_experiments
argument with enable_secure_boot
to the google_dataflow_job
resource, the Dataflow job can be deployed on shielded VMs.
resource "google_dataflow_job" "word_count_job" {
name = "sample-dataflow-wordcount-job"
template_gcs_path = "gs://dataflow-templates-europe-west3/latest/Word_Count"
temp_gcs_location = "${google_storage_bucket.bucket.url}/temp"
parameters = {
inputFile = "${google_storage_bucket.bucket.url}/input_file.txt",
output = "${google_storage_bucket.bucket.url}/word_count.txt"
}
additional_experiments = [
"enable_secure_boot"
]
}