kubernetesgoogle-kubernetes-enginegoogle-artifact-registryautopilot

Can not pull container image to GKE Autopilot from private Artifact Registry even these in same project


According to articles below, it seems we can pull container image to GKE from Artifact Registry without any additional authentication when these in same project.

https://cloud.google.com/artifact-registry/docs/integrate-gke

https://www.youtube.com/watch?v=BfS7mvPA-og

Error: ImagePullBackOff and Error: ErrImagePull errors with GKE

But when I try it, I faced ImagePullBackOff error.
Is there any mistake? misunderstanding? Or should I need use another authentication?

Reproduce

It's convenient to use Google Cloud Shell in some project on https://console.cloud.google.com .

Create Artifact Registry

gcloud artifacts repositories create test \
    --repository-format=docker \
    --location=asia-northeast2

Push sample image

gcloud auth configure-docker asia-northeast2-docker.pkg.dev
docker pull nginx
docker tag nginx asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
docker push asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image

Create GKE Autopilot cluster

Create GKE Autopilot cluster by using GUI console.

Almost all options is default but I changed these 2.

Deploy container image to GKE from Artifact Registry

gcloud container clusters get-credentials test --zone asia-northeast2
kubectl run test --image asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image

Check Pod state

kubectl describe po test
Name:             test
Namespace:        default
Priority:         0
Service Account:  default
Node:             xxxxxxxxxxxxxxxxxxx
Start Time:       Wed, 08 Feb 2023 12:38:08 +0000
Labels:           run=test
Annotations:      autopilot.gke.io/resource-adjustment:
                    {"input":{"containers":[{"name":"test"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"reque...
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Pending
IP:               10.73.0.25
IPs:
  IP:  10.73.0.25
Containers:
  test:
    Container ID:
    Image:          asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ErrImagePull
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Requests:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-szq85 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-szq85:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 kubernetes.io/arch=amd64:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age   From                                   Message
  ----     ------     ----  ----                                   -------
  Normal   Scheduled  19s   gke.io/optimize-utilization-scheduler  Successfully assigned default/test to xxxxxxxxxxxxxxxxxxx
  Normal   Pulling    16s   kubelet                                Pulling image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image"
  Warning  Failed     16s   kubelet                                Failed to pull image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image": rpc error: code = Unknown desc = failed to pull and unpack image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image:latest": failed to resolve reference "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image:latest": failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden
  Warning  Failed     16s   kubelet                                Error: ErrImagePull
  Normal   BackOff    15s   kubelet                                Back-off pulling image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image"
  Warning  Failed     15s   kubelet                                Error: ImagePullBackOff

then, I got ImagePullBackOff.


Solution

  • This could be because the GKE Autopilot service account does not have the necessary permissions to access the Artifact Registry. You can grant the needed permissions by adding the roles/artifactregistry.reader role to the service account that the GKE Autopilot node pool is configured to use. Additionally, you may need to adjust the IAM permissions for the service account so that it has access to the private Artifact Registry.

    gcloud artifacts repositories add-iam-policy-binding <repository-name> \
      --location=<location> \
      --member=serviceAccount:<nnn>-compute@developer.gserviceaccount.com \
      --role="roles/artifactregistry.reader"
    

    Can you try creating a new service account and granting it the necessary permissions to pull the image and try to pull the image once.

    Simple troubleshooting steps are:

    1. you should ensure that your GKE cluster is configured to allow access to the Artifact Registry. You can do this by going to the GKE dashboard and making sure that the “Allow access to Artifact Registry” option is enabled.
    2. The container image you are trying to pull does not exist in the Artifact Registry. You should check the registry to make sure that the container image is correctly uploaded and can be accessed.
    3. you can look into the error logs to get more information on what is causing this issue. Additionally, you can check the GKE documentation for more information on troubleshooting this issue.