I have the following terraform to configure a cloudwatch alert, which should trigger if the desired count of an ecs task is greater than the running count:
resource "aws_cloudwatch_metric_alarm" "ecs_task_count_alarm" {
for_each = toset(var.ecs_services) # Converts list to a set to iterate over unique service names
alarm_name = "ECS_Tasks_Running_Check_${each.key}"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
threshold = 0
alarm_description = "Alarm if DesiredTaskCount is greater than the RunningTaskCount for the ECS service ${each.key}."
# Metric query for RunningTaskCount
metric_query {
id = "running_task_count"
metric {
metric_name = "RunningTaskCount"
namespace = "AWS/ECS"
period = 60 # 1 minute
stat = "Average"
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = each.key
}
}
}
# Metric query for DesiredTaskCount
metric_query {
id = "desired_task_count"
metric {
metric_name = "DesiredTaskCount"
namespace = "AWS/ECS"
period = 60 # 1 minute
stat = "Average"
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = each.key
}
}
}
# Math expression to check if RunningTaskCount is less than DesiredTaskCount
metric_query {
id = "number_of_tasks_we_want_but_arent_getting"
expression = "desired_task_count - running_task_count"
label = "Difference between Desired and Running Task Counts"
return_data = true
}
alarm_actions = [aws_sns_topic.sns_alarms.arn]
insufficient_data_actions = [aws_sns_topic.sns_alarms.arn]
ok_actions = [aws_sns_topic.sns_alarms.arn]
}
However the alert is reporting insufficient data.
ecs_services
is a list of the service names ["foo", "bar"]
ecs_cluster_name
is the name of the cluster this is running in
What am I missing?
Reporting insufficient data means that either that's a new metric, or the namespace or the metric name is wrong.
It looks like the issue is related to where you’re looking for the metrics. Let me explain:
The AWS/ECS namespace only includes two basic default metrics: MemoryUtilization and CPUUtilization. If you want to track task-level metrics like DesiredTaskCount and RunningTaskCount, you’ll need to check them under the ECS/ContainerInsights namespace. This is where all the task-level metrics are available.
A quick tip: always manually verify the correct namespace and metric names in the CloudWatch console before writing your code. This will save you a lot of time and help avoid issues like this.
Updated Code
Here’s how your Terraform configuration should look after making the necessary changes:
# Metric query for RunningTaskCount
metric_query {
id = "running_task_count"
metric {
metric_name = "RunningTaskCount"
namespace = "ECS/ContainerInsights"
period = 60 # 1 minute
stat = "Average"
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = each.key
}
}
}
# Metric query for DesiredTaskCount
metric_query {
id = "desired_task_count"
metric {
metric_name = "DesiredTaskCount"
namespace = "ECS/ContainerInsights"
period = 60 # 1 minute
stat = "Average"
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = each.key
}
}
}