I have deployed a containerized R application to Google's CloudRun using Docker. Since I want to run the code at a regular interval (not request based), I have set up a cloud scheduler which should invoke the container via an HTTP POST request. I am getting an error (503) response, and I cannot figure out why. Here is the detailed log message:
{
httpRequest: {
status: 503
}
insertId: "qx6q58f4iewwp"
jsonPayload: {
@type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
jobName: "projects/PROJECT_ID/locations/europe-west1/jobs/scheduled-cloud-run-job"
status: "UNAVAILABLE"
targetType: "HTTP"
url: "CONTAINER_URL"
}
logName: "projects/PROJECT_ID/logs/cloudscheduler.googleapis.com%2Fexecutions"
receiveTimestamp: "2023-06-06T13:20:05.745924313Z"
resource: {
labels: {
job_id: "scheduled-cloud-run-job"
location: "europe-west1"
project_id: "PROJECT_ID"
}
type: "cloud_scheduler_job"
}
severity: "ERROR"
timestamp: "2023-06-06T13:20:05.745924313Z"
}
Here is the terraform configuration for the cloud scheduler:
# -- Create cloud scheduler job -- #
resource "google_cloud_scheduler_job" "default" {
name = "scheduled-cloud-run-job"
description = "Invokes the Cloud Run container with our pipeline on a recurrent basis."
schedule = "*/10 * * * *"
time_zone = "Europe/Stockholm"
retry_config {
retry_count = 1
}
http_target {
http_method = "GET"
uri = "${google_cloud_run_service.default.status[0].url}"
#body = base64encode("{\run_container\": \"run\"}")
#headers = {"Content-Type" : "application/json", "User-Agent" : "Google-Cloud-Scheduler"}
oidc_token {
service_account_email = google_service_account.default.email
}
}
}
Here is the terraform configuration for my cloud run service:
resource "google_cloud_run_service" "default" {
name = "containerized-pipeline"
location = var.region
project = var.project_id
template {
spec {
containers {
image = "${local.artifact_storage_address}:${local.tag}"
ports {
#name = "h2c"
container_port = 8080
}
resources {
limits = {
"cpu" = "1000m"
"memory" = "2000Mi"
}
}
}
container_concurrency = 1
}
metadata {
annotations = {
"run.googleapis.com/client-name" = "terraform"
"autoscaling.knative.dev/minScale" = 1
"autoscaling.knative.dev/maxScale" = 30
# "run.googleapis.com/cpu-throttling" = false
}
}
}
traffic {
percent = 100
latest_revision = true
}
depends_on = [
null_resource.docker_build
]
}
data "google_iam_policy" "noauth" {
binding {
role = "roles/run.invoker"
members = ["allUsers"]
}
}
resource "google_cloud_run_service_iam_policy" "noauth" {
location = google_cloud_run_service.default.location
project = google_cloud_run_service.default.project
service = google_cloud_run_service.default.name
policy_data = data.google_iam_policy.noauth.policy_data
}
And here is the part of my source code which starts up the API server on my container and is handling the POST requests sent by the Cloud Scheduler:
runPipeline = function(run_container){
body = as.character(run_container)
if (body == "run") {
date = as.character(dbGetQuery(con, q)$`f0_`)
if (date == "2017-01-01") {
load2BQinitial(data = reformatData(data=cleanData(df=getData(date=date))))
}
if (date != "2017-01-01") {
load2BQincremental(data = reformatData(data=cleanData(df=getData(date=date))))
}
return((paste0("Running the pipeline starting from ", date)))
}
else {
return((paste0("Something went wrong, please make sure GET request is sent correctly.")))
}
}
# -- Create API endpoint to receive & feed new data to model -- #
newBeakr() %>%
httpGET(path = "/launch", decorate(runPipeline)) %>% # Respond to GET requests at the "/launch" route
handleErrors() %>% # Handle any errors with a JSON response
listen(host = "0.0.0.0", port = 8080) # Start the server on port 8080
I came across several google issues being tracked for 500 and 503 errors, none of them gave a conclusive solution. Anyone have any idea why I am getting the 503?
Sending a curl request results in the following response:
{
httpRequest: {
latency: "2.733624s"
protocol: "H2C"
remoteIp: "X.X.X.X"
requestMethod: "GET"
requestSize: "544"
requestUrl: "<URL>a.run.app"
responseSize: "1238"
serverIp: "X.X.X.X"
status: 503
userAgent: "curl/7.74.0"
}
insertId: "XXXXXXXXXXXXXXXX"
labels: {
instanceId: "XXXXXXXXXXXXX"
}
logName: "projects/PROJECT_ID/logs/run.googleapis.com%2Frequests"
receiveTimestamp: "2023-06-07T07:01:14.296557338Z"
resource: {
labels: {
configuration_name: "containerized-pipeline"
location: "europe-west1"
project_id: "PROJECT_ID"
revision_name: "containerized-pipeline-00001-pwn"
service_name: "containerized-pipeline"
}
type: "cloud_run_revision"
}
severity: "ERROR"
spanId: "12660848597156063613"
textPayload: "The request failed because either the HTTP response was malformed or connection to the instance had an error. Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#malformed-response-or-connection-error"
timestamp: "2023-06-07T07:01:11.486699Z"
trace: "projects/PROJECT_ID/traces/34366480e3a4143d4008eed0452eacf0"
traceSampled: true
}
UPDATE: I should also mention that I ran my R application locally and sent in an API request to my local host, and my callback function worked just fine. So the issue has something to do with how my Cloud Run instance and Cloud Scheduler are interacting.. I just don't know what this issue is.
So after some additional debugging I was able to figure out what the issue was and to resolve it. Though I'm not sure whether the error 503 which I was getting accurately reflects this, it was a mismatch in the location of my resources.
For more context: My terraform state-file bucket is in europe-west4
, and the original location of my all of my resources were also europe-west4. Somewhere along the line when working, I decided to create my cloud scheduler service and I did so in region europe-west1
, so I also re-created the artifact registry, container and cloud-run service in europe-west1
. When debugging, I eventually changed the location of the artifact registry and cloud-run service back to europe-west4
and kept the scheduler in europe-west1
, at which point I stopped getting the 503 error. Everything works completely fine now.
Not sure how useful or replicable my solution is, but hopefully it helps anyone who runs into the same issue.