amazon-web-servicesamazon-cloudwatchamazon-ebsamazon-cloudwatch-metrics

CloudWatch Metrics for Volume IOPS, Volume Throughput (MiB/s) and Network (Gbps)


I had to troubleshoot one application at AWS and was not easy to use all CloudWatch Metrics Graphs to interpret environment healthiness, so I decided to share my experience here.

CloudWatch give us metrics for CPU, Memory*, Disk and Network.

* to get memory metrics you need to install CloudWatch Agent.

CPU and Memory give us the metric in percentage, which is clear ans strait-forward to interpret. But Disk and Network are not that easy, for example I would like to check IOPS and Throughput (MiB/s) for my volumes and Network (Gbps).

I needed those values because AWS define EBS limits as IOPS and Throughput (MB/s) and Instance network limit as Gbps.


Solution

  • Total IOPS
    EBS Volume give us metrics VolumeReadOps and VolumeWriteOps. Let me quote AWS documentation.

    VolumeReadOps - The total number of read operations in a specified period of time.
    To calculate the average read operations per second (read IOPS) for the period, divide the total read operations in the period by the number of seconds in that period.

    VolumeWriteOps - The total number of write operations in a specified period of time.
    To calculate the average write operations per second (write IOPS) for the period, divide the total write operations in the period by the number of seconds in that period.

    Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cloudwatch_ebs.html

    To get the Total IOPS we need to (VolumeReadOps + VolumeWriteOps) / SecondsInPeriod.
    Luckily CloudWatch help us with Expression. Use the expression below, the function PERIOD is our friend here.

    m1 = VolumeWriteOps - Sum
    m2 = VolumeReadOps - Sum
    Expression: (m1+m2)/PERIOD(m1)
    

    Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html


    Total Throughput (MiB/s)
    EBS Volume give us metrics VolumeReadBytes and VolumeWriteBytes. Let me quote AWS documentation.

    VolumeReadBytes - Provides information on the read operations in a specified period of time. The Sum statistic reports the total number of bytes transferred during the period.

    VolumeWriteBytes - Provides information on the write operations in a specified period of time. The Sum statistic reports the total number of bytes transferred during the period.

    Both metrics give us the value in bytes, but we want them in MiB, so to convert we need to divide by 1048576, which is the result of 1024 * 1024. Let me explain in detail.

    1024 bytes = 1 KiB
    1024 KiB = 1 MiB
    

    To get the Total Throughput in MiB/s we need to ((VolumeReadBytes + VolumeWriteBytes) / 1048576) / SecondsInPeriod.
    Use the expression below, the function PERIOD is our friend here.

    m1 = VolumeWriteBytes - Sum
    m2 = VolumeReadBytes - Sum
    Expression: ((m1+m2)/1048576)/PERIOD(m1)
    

    Total Network (Gbps)
    EC2 Instance give us metrics NetworkIn and NetworkOut. Let me quote AWS documentation.

    NetworkIn - The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance.
    The number reported is the number of bytes received during the period. If you are using basic (five-minute) monitoring, you can divide this number by 300 to find Bytes/second. If you have detailed (one-minute) monitoring, divide it by 60.

    NetworkOut - The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance.
    The number reported is the number of bytes sent during the period. If you are using basic (five-minute) monitoring, you can divide this number by 300 to find Bytes/second. If you have detailed (one-minute) monitoring, divide it by 60.

    Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html

    Both metrics give us the value in bytes per period, but we want them in gigabits / second.
    To convert from "period" to "second" we just need to divide by 300 (as I am using standard monitoring).

    To convert from bytes to gigabits we need to divide and multiply as (1000 / 1000 / 1000) * 8. Let me explain in detail.

    1000 bits = 1 kilobits
    1000 kilobits = 1 megabits
    1000 megabits = 1 gigabits
    1 byte = 8 bits
    

    To get Total Network in Gbps we need to ((NetworkIn + NetworkOut) / 300) / 0.008.

    m1 = NetworkIn - Sum
    m2 = NetworkOut - Sum
    Expression: (((((m1+m2)/300)/1000)/1000)/1000)*8