google-kubernetes-enginegoogle-anthosgoogle-anthos-service-mesh

Anthos Service Mesh Metrics


I recently deployed Anthos Service Mesh to begin using a turnkey approach for deploying GKE and Istio. So far so good but the one issue I'm seeing is that the basic metrics (CPU, Memory and Disk) for pods are not showing up.

When I look at the logs for the prometheus-to-sd pods, I'm seeing the following errors:

Error while sending request to Stackdriver googleapi: Error 403: Permission monitoring.timeSeries.create denied (or the resource may not exist)., forbidden

Similar errors with fluentd-gke pods.

Unable to export to Monitoring service because: GaxError RPC failed, caused by 7:Permission monitoring.timeSeries.create denied (or the resource may not exist).

I've tried adjusting the Workload Identity permissions with the GCP SA to KSA mapping but no luck. Anyone else run into this?

These are the instructions I've been following.

https://cloud.google.com/service-mesh/docs/gke-anthos-cli-new-cluster


Solution

  • Turns out Workload Identity doesn't work with pods with the the hostnetwork set to true. One would think with Anthos that the basic monitoring of pods and compute nodes would be enabled out of the box.

    Two options to resolve this:

    1.) Update the default compute engine account with the following roles:
    -roles/logging.logWriter
    -roles/monitoring.metricWriter
    -roles/monitoring.viewer
    
    2.) Deploy the node pools with a custom service account with the aforementioned roles.
    

    To get things rolling I used option #1.