amazon-web-servicesamazon-ekscni

Intermittent ec2ApiErrCount in EKS CNI Metrics Helper


I've noticed that the ec2ApiErrCount metric sometimes reports non-zero values

To investigate potential EC2 API errors, I enabled CloudTrail management events. However, I didn't find any corresponding EC2 API error logs in CloudTrail.

here is my query snippet:

fields @timestamp, @log, eventName
| filter eventSource like "ec2.amazonaws.com"
| filter userAgent like "amazon-vpc-cni"
| sort @timestamp desc

Are there other factors or logs that I should check to understand the discrepancy between the ec2ApiErrCount metric and CloudTrail logs?

What could be causing ec2ApiErrCount to report non-zero values ?

enter image description here


Solution

  • Here is the log insight query snippet for filtering error EC2 API calls from VPC CNI:

    fields @timestamp, userAgent, eventName
    | filter not isempty(errorCode)
    | filter eventSource like "ec2.amazonaws.com"
    | filter userAgent like "amazon-vpc-cni"
    | sort @timestamp desc
    | limit 100
    

    I've noticed that there are some DeleteNetworkInterface error like "Network interface 'eni-xxxx' is currently in use."

    However, this error, which is common, is handled by a retry loop.