I am trying to follow for the most part this guide to deploy a Google Workflow to launch a container optimized VM on compute engine, run a long running task, and then clean up the VM. I am also using the GCP docs in Creating a Workflow with Terraform.
I have my image built. The image successfully runs locally and also runs on the container optimized VM when I launch it through the console. The image is built in my CI/CD pipeline on Cloud Build and uploaded to Google Container Registry.
However, when I try to launch the container via the Cloud Workflow console, I get an error that seems to me to be related to authentication to GCR. Here is the stack trace from Cloud Logging.
I can't explain this because both the VM launched from the console and the one launched via workflows use the same service account with the same scopes. The guide I am following suggests taking the REST-equivalent of a known working VM and converting that into a YAML config to be used in the Terraform code. This is what I have done. And upon comparing the two machines in the console.. they look identical to me.
This is the Terraform code used to create the workflow. I pass the Commit SHA into the Terraform script to reference the container. I have confirmed that the container built on Cloud Build can be downloaded and run on my local machine.
resource "google_workflows_workflow" "workflows_example" {
name = "scraper-workflow"
region = "us-central1"
description = "Scraper workflow"
service_account = "scraper-workflow-executor@rdmops-219503.iam.gserviceaccount.com"
source_contents = <<-EOF
# FYI, In terraform you need to escape the $$ or it will cause errors.
- init:
assign:
- commitSHA: ${var.SCRAPER_IMAGE_COMMIT_SHA}
- projectId: $${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
- projectNumber: $${sys.get_env("GOOGLE_CLOUD_PROJECT_NUMBER")}
- zone: "us-central1-a"
- machineType: "e2-medium"
- ticker: $${args.ticker}
- instanceName: $${ticker+"-scraper"}
- create_and_start_vm:
call: googleapis.compute.v1.instances.insert
args:
project: $${projectId}
zone: $${zone}
body:
canIpForward: false
confidentialInstanceConfig:
enableConfidentialCompute: false
deletionProtection: false
shieldedInstanceConfig:
enableIntegrityMonitoring: true
enableSecureBoot: false
enableVtpm: true
tags:
items:
- http-server
- https-server
name: $${instanceName}
labels:
- "container-vm": "cos-stable-97-16919-29-40"
machineType: $${"zones/" + zone + "/machineTypes/" + machineType}
disks:
- initializeParams:
diskSizeGb: "10"
diskType: "projects/rdmops-219503/zones/us-central1-a/diskTypes/pd-balanced"
sourceImage: "projects/cos-cloud/global/images/cos-stable-97-16919-29-40"
boot: true
autoDelete: true
deviceName: $${instanceName}
# Needed to make sure the VM has an external IP
networkInterfaces:
- accessConfigs:
- name: "External NAT"
networkTier: "PREMIUM"
stackType: "IPV4_ONLY"
subnetwork: "projects/rdmops-219503/regions/us-central1/subnetworks/default"
# The container to run
metadata:
items:
- key: "google-logging-enabled"
value: "true"
- key: "gce-container-declaration"
value: '$${"spec:\n containers:\n - name: scraper-workflow\n image: gcr.io/" + projectId + "/scraper-workflow:" + commitSHA + "\n stdin: false\n tty: false\n restartPolicy: Never\n"}'
# Needed to be able to pull down and run the container
serviceAccounts:
- email: 937088654099-compute@developer.gserviceaccount.com
scopes:
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring.write
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/trace.append
- log_wait_for_vm_network:
call: sys.log
args:
data: $${"Waiting for VM network to initialize"}
- wait_for_vm_network:
call: sys.sleep
args:
seconds: 10
- get_instance:
call: googleapis.compute.v1.instances.get
args:
instance: $${instanceName}
project: $${projectId}
zone: $${zone}
result: instance
- extract_external_ip_and_construct_urls:
assign:
- external_ip: $${instance.networkInterfaces[0].accessConfigs[0].natIP}
- base_url: $${"http://" + external_ip + "/"}
- start_url: $${base_url + "start"}
- poll_url: $${base_url + "poll"}
# Redacted rest of workflow for simplicity
EOF
}
And this is the REST-equivalent of the instance I create through the console. This works fine. The SA looks the same to me. And the scopes... And all of the other info...
POST https://www.googleapis.com/compute/v1/projects/rdmops-219503/zones/us-central1-a/instances
{
"canIpForward": false,
"confidentialInstanceConfig": {
"enableConfidentialCompute": false
},
"deletionProtection": false,
"description": "",
"disks": [
{
"autoDelete": true,
"boot": true,
"deviceName": "instance-1",
"initializeParams": {
"diskSizeGb": "10",
"diskType": "projects/rdmops-219503/zones/us-central1-a/diskTypes/pd-balanced",
"labels": {},
"sourceImage": "projects/cos-cloud/global/images/cos-stable-97-16919-29-40"
},
"mode": "READ_WRITE",
"type": "PERSISTENT"
}
],
"displayDevice": {
"enableDisplay": false
},
"guestAccelerators": [],
"keyRevocationActionType": "NONE",
"labels": {
"container-vm": "cos-stable-97-16919-29-40"
},
"machineType": "projects/rdmops-219503/zones/us-central1-a/machineTypes/e2-medium",
"metadata": {
"items": [
{
"key": "ssh-keys",
"value": "XXXXX"
},
{
"key": "gce-container-declaration",
"value": "spec:\n containers:\n - name: instance-9\n image: gcr.io/rdmops-219503/workflow-scraper:test3\n args:\n - ''\n stdin: false\n tty: false\n restartPolicy: Never\n# This container declaration format is not public API and may change without notice. Please\n# use gcloud command-line tool or Google Cloud Console to run Containers on Google Compute Engine."
}
]
},
"name": "instance-9",
"networkInterfaces": [
{
"accessConfigs": [
{
"name": "External NAT",
"networkTier": "PREMIUM"
}
],
"stackType": "IPV4_ONLY",
"subnetwork": "projects/rdmops-219503/regions/us-central1/subnetworks/default"
}
],
"reservationAffinity": {
"consumeReservationType": "ANY_RESERVATION"
},
"scheduling": {
"automaticRestart": true,
"onHostMaintenance": "MIGRATE",
"preemptible": false,
"provisioningModel": "STANDARD"
},
"serviceAccounts": [
{
"email": "937088654099-compute@developer.gserviceaccount.com",
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append"
]
}
],
"shieldedInstanceConfig": {
"enableIntegrityMonitoring": true,
"enableSecureBoot": false,
"enableVtpm": true
},
"tags": {
"items": [
"http-server",
"https-server"
]
},
"zone": "projects/rdmops-219503/zones/us-central1-a"
}
It looks like the cause of the problem is that your image can't be found, and not that your service account is not allowed to access Container Registry.
The service account associated with your GCE instance only requires Cloud Storage read and write access to pull the private Docker image.
A permissions error would look something like this, as I tested:
pull access denied for gcr.io/<PROJECT_ID/IMAGE:TAG>, repository does not exist or may require 'docker login'
However, the error message you are seeing is returned when the image cannot be found. I think the problem is rooted in the commit SHA you are using to identify the image, given that I could execute your workflow using a predefined container tag (similarly to how you did so in the REST API for GCE).
Without seeing your entire CI-CD pipeline configuration, it can't be confirmed, but it should be the section to focus on. Let me know if this was useful.