I'm using Google's managed collection on my GKE cluster (v1.24.26) and I can't find a way to collect metrics related to Kubernetes cronjobs. I can't find kube_cronjob_next_schedule_time, kube_job_status_failed nor kube_job_status_succeeded.
Do I need to configure something specific to gather this metrics on GKE?
I tried restarting kube-state-metrics-0, restarting the collectors, nothing worked.
Ok, this threw me too.
I realized (belatedly) that kube-state-metrics
creates both a PodMonitoring
and ClusterPodMonitoring
.
The PodMonitoring
resource exposes metrics published by the Pod created by statefulset.apps/kube-state-metrics
on the Pod's metric-self
port (8081). The ClusterPodMonitoring
exposes metrics published on the Pod's metric
port (8080) but this doesn't include cronjob
-related metrics:
kubectl get clusterpodmonitoring/kube-state-metrics \
--output=jsonpath="{.spec.endpoints[0].metricRelabeling[0]}" \
| jq -r .
{
"action": "keep",
"regex": "kube_(daemonset|deployment|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler|job_created)(_.+)?",
"sourceLabels": [
"__name__"
]
}
NOTE The
regex
does not includekube_cronjob
and only includeskube_job_created
patterns.
You will need to add a regex for kube_cronjob
and kube_job
metrics that you want in addition.
One way (!) to do this after you've deployed Kube State Metrics, is to kubectl patch
the clusterpodmonitoring
resource.
Of course, a better approach is to edit the Google-provided YAML (kube-state-metrics.yaml#L324
) before you Install Kube State Metrics.
VALUE="kube_(cronjob|daemonset|deployment|job|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler)(_.+)?"
PATCH="
[
{
'op':'replace',
'path': '/spec/endpoints/0/metricRelabeling/0/regex',
'value':'${VALUE}'
}
]"
kubectl patch clusterpodmonitoring/kube-state-metrics \
--type=json \
--patch="${PATCH}"
NOTE This (VALUE
) includes 2 changes:
kube_cronjob_*
metricskube_job_*
metrics (removing the redundant kube_job_created_*
metrics)You can demonstrate that the metrics are now scraped by Cloud Monitoring using metrics explorer and PromQL or native MQL (prometheus.googleapis.com/kube_cronjob_next_schedule_time/gauge
) or using APIs Explorer for Cloud Monitoring's Prometheus API:
PROJECT="..." # Your Project ID
ENDPOINT="https://monitoring.googleapis.com/v1/projects/${PROJECT}/location/global/prometheus/api/v1/query"
TOKEN="$(gcloud auth print-access-token)"
METRIC="kube_cronjob_next_schedule_time"
curl \
--silent \
--request POST \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data "{\"query\":\"${METRIC}\"}" \
${ENDPOINT} \
| jq -r .
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "kube_cronjob_next_schedule_time",
"cluster": "...",
"cronjob": "hello",
"instance": "kube-state-metrics-0:metrics",
"job": "kube-state-metrics",
"location": "...",
"namespace": "test",
"project_id": "..."
},
"value": [
1703893639.8,
"1703893680"
]
}
]
}
}
NOTE In this case I'd created a CronJob
called hello
in test
namespace.