google-cloud-platformgoogle-cloud-mlgoogle-cloud-vertex-ai

Scheduling execution of a Vertex AI notebook returns error: "2: Unknown Error.: Unable to submit schedule"


I'm using GCP's Vertex AI Workbench to develop ML code in a Managed Notebook. The notebook execution works just fine both when run interactively, and when run by an Executor with Type: "One-time execution" from the notebook's web interface.

On the other hand, whenever I select Type: "Schedule-based recurring executions" and click on SUBMIT, it hangs for about 30 seconds with the message "Submitting schedule", then it fails with the following error:

2: Unknown Error.: Unable to submit schedule

All the parameters of the execution are the same in both cases.

I believe there must be something very obvious that I'm missing here, but couldn't really find anything useful in the documentation nor anywhere else.

EDIT:

This is the full log of the error, which I also find not helpful:

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "code": 2,
      "message": "Error 0: googleapi: Error 503: The service is currently unavailable.\nDetails:\n[\n  {\n    \"@type\": \"type.googleapis.com/google.rpc.DebugInfo\",\n    \"detail\": \"[ORIGINAL ERROR] RPC::DEADLINE_EXCEEDED: Deadline exceeded while waiting for channels to become available, and/or be able to accept RPCs. Channel count = 0. Queue size = 0. Deadline=28.992677276s; Extensible Stubs capped the outgoing deadline: see go/stubs-longest-allowed-deadline; \\ncom.google.net.rpc3.RpcException: DEADLINE_EXCEEDED: Deadline exceeded while waiting for channels to become available, and/or be able to accept RPCs. Channel count = 0. Queue size = 0. Deadline=28.992677276s; Extensible Stubs capped the outgoing deadline: see go/stubs-longest-allowed-deadline; \\n\\tat com.google.apphosting.admin.mixer.actions.CreateApplicationAction.checkRpcPresentStatus(CreateApplicationAction.java:194)\\n\\tSuppressed: com.google.common.labs.concurrent.LabsFutures$LabeledExecutionException: GraphFuture{key=@com.google.apphosting.admin.mixer.actions.CreateApplicationAction$CreateReturnMessage com.google.protobuf.Message} failed: com.google.net.rpc3.RpcException: DEADLINE_EXCEEDED: Deadline exceeded while waiting for channels to become available, and/or be able to accept RPCs. Channel count = 0. Queue size = 0. Deadline=28.992677276s; Extensible Stubs capped the outgoing deadline: see go/stubs-longest-allowed-deadline; \\n\"\n  }\n]\n, backendError"
    },
    "authenticationInfo": {
      "principalEmail": "xxxxxxxxxx@developer.gserviceaccount.com",
      "serviceAccountDelegationInfo": [
        {
          "firstPartyPrincipal": {
            "principalEmail": "xxxxxxxxxx@gcp-sa-notebooks.iam.gserviceaccount.com"
          }
        }
      ]
    },
    "requestMetadata": {
      "requestAttributes": {},
      "destinationAttributes": {}
    },
    "serviceName": "notebooks.googleapis.com",
    "methodName": "google.cloud.notebooks.v1.NotebookService.CreateSchedule",
    "resourceName": "projects/MY-PROJECT/locations/europe-west4/schedules/test_rnn_aria__1673534440196",
    "resourceLocation": {
      "currentLocations": [
        "europe-west4"
      ]
    }
  },
  "insertId": "184n01ud3oul",
  "resource": {
    "type": "audited_resource",
    "labels": {
      "method": "google.cloud.notebooks.v1.NotebookService.CreateSchedule",
      "service": "notebooks.googleapis.com",
      "project_id": "MY-PROJECT"
    }
  },
  "timestamp": "2023-01-12T14:42:42.563449708Z",
  "severity": "ERROR",
  "logName": "projects/MY-PROJECT/logs/cloudaudit.googleapis.com%2Factivity",
  "operation": {
    "id": "projects/MY-PROJECT/locations/europe-west4/operations/operation-XXXXXXXXXX",
    "producer": "notebooks.googleapis.com",
    "last": true
  },
  "receiveTimestamp": "2023-01-12T14:42:43.526355300Z"
}

Appreciate your help!


Solution

  • Long overdue update, hope it helps other people struggling with this elusive issue.

    The official answer from Google is that "notebook scheduling is no longer in focus" and they suggested to use Vertex Pipelines instead.

    I guess it's a polite way to say "we're decommissioning the feature and we can't be bothered warning the users until their stuff stops working".