amazon-web-servicesamazon-cloudwatchamazon-ecsautoscalingamazon-cloudwatch-metrics

AWS/ECS CPUUtilization average vs maximum


After reading AWS documentation I am still not clear about cloudwatch metrics statistics average and maximum, specifically for ECS CPUUtilization.

I have a AWS ECS cluster fargate setup, a service with minimum count of 2 healthy task. I have enabled autoscaling using AWS/ECS CPUUtilization for ClusterName my and ServiceName. A Cloudwatch alarm triggers is configured to trigger when average cpu utilization is more than 75% for one minute period for 3 data points.

I also have a health check setup with a frequency of 30 seconds and timeout of 5 mins and

I ran a performance script to test the autoscaling behavior, but I am noticing the service gets marked as unhealthy and new tasks gets created. When I check the cpuutilization metric, for average statistics it shows around 44% utilization but for maximum statistics it shows more than hundred percent, screenshots attached.

Average

Average

Maximum maximum

So what is average and maximum here, does this mean average is average cpu utilization of both my instances? and maximum shows one of my instance's cpu utilization more than 100?


Solution

  • Average and maximum here measures the average CPU usage over 1 minute period and the max CPU usage over 1 minute period.

    In terms of configuring autoscaling rules, you want to use the average metric.

    The maximum metric usually is random short burst spikes that can be caused by things like garbage collection.

    The average metric however is the p50 CPU usage, so half of the time the CPU usage is more than that, half is less. (Yeah, technically that is the median, but for now, it doesn't matter as much).

    You most likely want to be scaling up using average metric when say your CPU goes to say 75-85% (keep in mind, you need to give time for new tasks to warm up).

    Max metric can generally be ignored for autoscaling usecases.