I am using HCP trying to deploy Grafana meta-monitoring and alloy helm chart using terraform helm_release and it is not successful and errors in context deadline passed but if I use helm command the helm chart is deployed without any error
resource "helm_release" "meta_monitoring" {
name = "meta-monitoring"
repository = "https://grafana.github.io/helm-charts"
chart = "meta-monitoring"
version = "1.3.0"
namespace = "meta"
dependency_update = true
depends_on = [kubernetes_secret.minio]
values = [
"${file("./helm_charts_values/meta/values.yaml")}"
]
}
resource "helm_release" "alloy" {
name = "alloy"
repository = "https://grafana.github.io/helm-charts"
chart = "alloy"
version = "0.10.0"
namespace = "soc-loki"
depends_on = [kubernetes_namespace.loki]
values = [
"${file("./helm_charts_values/alloy/values.yaml")}"
]
}
dependent resource have no problem
resource "kubernetes_namespace" "meta" {
metadata {
name = "meta"
}
}
resource "kubernetes_secret" "minio" {
metadata {
name = "minio"
namespace = kubernetes_namespace.meta.metadata[0].name
}
data = {
rootUser = "xxxxxx"
rootPassword = "xxxxx"
}
type = "Opaque"
}
If I use helm command directly it works
helm upgrade --install meta-monitoring grafana/meta-monitoring -n meta -f meta/values.yaml
There is no connection problem to aks as other charts are being deployed successfully
Values.yaml for reference for meta-monitoring
namespacesToMonitor:
- soc-loki
cloud:
logs:
enabled: false
metrics:
enabled: false
traces:
enabled: false
local:
grafana:
enabled: true
logs:
enabled: true
metrics:
enabled: true
traces:
enabled: true
minio:
enabled: true
Other chart which is successful
resource "helm_release" "alert_manager" {
name = "alertmanager"
repository = "https://prometheus-community.github.io/helm-charts"
chart = "alertmanager"
version = "1.13.1"
namespace = "soc-loki"
depends_on = [kubernetes_namespace.loki]
values = [
"${file("./helm_charts_values/alertmanager/values.yaml")}"
]
}
Here is how I solved it.
By setting wait=false
explicitly allows the helm release to be marked as successful as well as all the resource of helm chart are running perfectly for example pods in the ready state
resource "helm_release" "alloy" {
name = "alloy"
repository = "https://grafana.github.io/helm-charts"
chart = "alloy"
version = "0.10.0"
namespace = "soc-loki"
depends_on = [kubernetes_namespace.loki, helm_release.meta_monitoring]
wait = false
values = [
"${file("./helm_charts_values/alloy/values.yaml")}"
]
}
resource "helm_release" "meta_monitoring" {
name = "meta-monitoring"
repository = "https://grafana.github.io/helm-charts"
chart = "meta-monitoring"
version = "1.3.0"
namespace = "meta"
dependency_update = true
depends_on = [kubernetes_secret.minio]
wait = false
values = [
"${file("./helm_charts_values/meta/values.yaml")}"
]
}
As per docs, wait
Will wait until all resources are in a ready state before marking the release as successful. Defaults to true.
Here is my guess what is happening
There might be some jobs related to alloy and meta monitoring which might be set to run post-install. When I use manual helm command all the resource take around 2-3 minutes to be in ready state
Post install jobs will not launch until the release has been marked successful.
If wait=true
in the helm_release, terraform does not mark the release as successful until all resources are in a ready state.This effectively creates a deadlock when resources depend on the result of the running jobs.
TO understand more I might need to go in detail how meta-monitoring and allow helm charts are being deployed to understand the floe but this is the closest guess for me
Ref: https://github.com/hashicorp/terraform-provider-helm/issues/683#issuecomment-830872443