amazon-web-servicesmonitoringamazon-cloudwatchvideo-conferencingnetwork-monitoring

Network data out - nmon/nload vs AWS Cloudwatch disparity


We are running a video conferencing server in an EC2 instance.

Since this is a data out (egress) heavy app, we want to monitor the network data out closely (since we are charged heavily for that).

enter image description here

As seen in the screenshot above, in our test, using nmon (top right) or nload (left) in our EC2 server shows the network out as 138 Mbits/s in nload and 17263 KB/s in nmon which are very close (138/8 = 17.25).

But, when we check the network out (bytes) in AWS Cloudwatch (bottom right), the number shown is very high (~ 1 GB/s) (which makes more sense for the test we are running), and this is the number for which we are finally charged.

Why is there such a big difference between nmon/nload and AWS Cloudwatch? Are we missing some understanding here? Are we not looking at the AWS Cloudwatch metrics correctly?

Thank you for your help!

Edit:

Adding the screenshot of a longer test which shows the average network out metric in AWS Cloudwatch to be flat around 1 GB for the test duration while nmon shows average network out of 15816 KB/s.

enter image description here


Solution

  • Just figured out the answer to this.

    The following link talks about the periods of data capture in AWS: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html

    Periods

    A period is the length of time associated with a specific Amazon CloudWatch statistic. Each statistic represents an aggregation of the metrics data collected for a specified period of time. Periods are defined in numbers of seconds, and valid values for period are 1, 5, 10, 30, or any multiple of 60. For example, to specify a period of six minutes, use 360 as the period value. You can adjust how the data is aggregated by varying the length of the period. A period can be as short as one second or as long as one day (86,400 seconds). The default value is 60 seconds.

    Only custom metrics that you define with a storage resolution of 1 second support sub-minute periods. Even though the option to set a period below 60 is always available in the console, you should select a period that aligns to how the metric is stored. For more information about metrics that support sub-minute periods, see High-resolution metrics.


    As seen in the link above, if we don't set a custom metric with custom periods, AWS by default does not capture sub-minute data. So, the lowest resolution of data available is every 1 minute.

    So, in our case, the network out data within 60 seconds is aggregated and captured as a single data point.

    Even if I change the statistic to Average and the period to 1 second, it still shows every 1 minute data.

    Now, if I divide 1.01 GB (shown by AWS) with 60, I get the per second data which is roughly around 16.8 MBps which is very close to the data shown by nmon or nload.