amazon-web-servicesamazon-cloudwatchcloudwatch-alarms

How to view log entries from Cloudwatch Metrics based on APIGateway?


I am setting up Cloudwatch and want to be able to track when my serverless setup (APIGateway -> various lambdas) are getting 5xx errors. Then I want to be able to understand what caused the issue.

I see that to set an alarm, I need to first have a metric. What is nice is that there's built in Metrics for API Gateways: Go to Metrics -> All metrics -> AWS namespaces -> ApiGateway. 4xxError, 5xxError, etc.

So, I set the 5xxError metric and associated alarm for a stage for let's say my-example-gateway. And recently got the 5xxError alarm triggered. So now that I know my application is getting 5xx errors, my question is how do I drill down to individual log entries to see more info?

Interesting thing to me, is I did not yet have logging set up for my-example-gateway: https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-logging.html -- so now I do have logging set up and I can view execution logs for the gateway. But, how / where does AWS collect the logs for those default metrics mentioned above (5xxError)?

If I go to Logs -> Log Groups, (before setting up Execution logs mentioned above), I did not see a Log group for my-example-gateway. Now I do so API-Gateway-Execution-Logs_<my-example-gateway-id>/<stage>

Next to my visual within the metrics dashboard, I do see the option to View Logs and View in Metrics, but View Logs doesn't take me anywhere useful, just to the default Log Groups page with all of my log groups.


Adding some resources I've found for future searches:

By default, API Gateway publishes 4XXError, 5XXError, CacheHitCount, CacheMissCount, Count, IntegrationLatency and Latency per API. Once detailed metrics for API Gateway is enabled, all the above metrics along with dimensions - ApiName, method, resource, stage will be emitted to CloudWatch.

Note, not all metrics may be emitted e.g. 4XXError or 5XXError may not show up if there are no errors.

https://docs.aws.amazon.com/apigateway/latest/developerguide/monitoring-cloudwatch.html

The dashboards on API Gateway show a graph of all the available, default metrics collected: https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-api-dashboard.html

[![View Logs dropdown][1]][1] [1]: https://i.sstatic.net/0k91J8rC.png


Solution

  • how do I drill down to individual log entries to see more info?

    Simple answer is you can't do it directly. AWS does not provide a way to directly go to the request that failed.

    One approach that works well for me is going to CloudWatch Insights, selecting the date range based on the relevant metric, and searching with various strings like the ones below. I then locate the API Gateway request ID and use it to retrieve all logs related to that specific request. This provides details about the error message and the reason for the request failure. Additionally, I extract the AWS Integration Endpoint RequestId from the logs to further investigate in the Lambda logs.

    You can also check the Lambda logs for error messages, as these logs are generated by your application and typically contain more detailed information. Unfortunately, this is the only way to search for the logs at the moment. I know its like searching the needle in a haystack.

    Additionally, you can explore solutions like sending logs to Amazon SQS asynchronously, which allows you to filter and process logs based on your specific needs. Once the logs are in SQS, you can query and filter them however you prefer. Another approach is to use CloudWatch subscription filters to capture and monitor specific error conditions in real time.

    Alternatively, you can forward logs to third-party platforms like Splunk or Datadog, which provide advanced querying capabilities and analytics to help you get deeper insights from your log data.