Review the built-in Amazon CloudWatch metrics and their dimensions for each of the services you plan to use so that you can decide how to best leverage them vs. adding custom metrics. There are also many third-party tools that provide monitoring and metrics reporting from CloudWatch data.
Business Key Performance Indicators (KPIs) measure your application performance against business goals. It is extremely important to know when something is critically affecting your overall business (revenue wise or not).
Examples: Orders placed, debit/credit card operations, flights purchased
Customer experience metrics
Customer experience data dictates not only the overall effectiveness of the UI/UX but also whether changes or anomalies are affecting the customer experience in a particular section of your application. These metrics are often measured in percentiles to prevent outliers when trying to understand the impact over time and how widespread it is across your customer base.
Examples: Perceived latency, time it takes to add an item to a basket/to checkout, page load times
Vendor and application metrics are important to underpin root causes. System metrics also tell you if your systems are healthy, at risk, or already impacting your customers.
Examples: Percentage of HTTP errors/success, memory utilization, function duration/error/throttling, queue length, stream records length, integration latency
Ops metrics are important to understand sustainability and maintenance of a given system and crucial to pinpoint how stability has progressed/degraded over time.
Examples: Number of tickets([un]successful resolutions, etc.), number of times people on-call were paged, availability, CI/CD pipeline stats (successful/failed deployments, feedback time, cycle and lead time)
Built-in CloudWatch metrics pages
- Amazon API Gateway Metrics
- AWS Lambda Metrics
- Amazon SQS metrics
- AWS Step Functions Metrics
- Amazon SNS Metrics
- Amazon Kinesis Data Streams Metrics
Logs let you dig into specific issues, but you can also use log data to create business-level metrics via CloudWatch Logs metric filters. You can interact with logs via CloudWatch Logs to drill into any specific log entry or filter them based on a pattern to create your own metrics. See how the services listed below interact with CloudWatch Logs.
Lambda automatically logs all requests handled by your function and stores them in CloudWatch Logs. This gives you access to information about each invocation of your Lambda function.
You can log almost anything to CloudWatch Logs by using print or standard out statements in your functions. When you create custom logs, use a structured format like a JSON event to make it easier to report from them.
API Gateway execution and access logs
API Gateway execution logs include information on errors as well as execution traces. Info like parameter values, payload, Lambda authorizers used, and API keys appear in the execution logs. You can log just errors or errors and info. Logging is set up per API stage. These logs are detailed, so you want to be thoughtful about what you need. Also, log groups don’t expire by default, so make sure to set retention values suitable to your workload.
You can also create custom access logs and send them to your preferred CloudWatch group to track who is accessing your APIs and how. You can specify the access details by selecting context variables and choosing the format you want to use.
CloudWatch Logs Insights
CloudWatch Log Insights lets you use prebuilt or custom queries on your logs to provide aggregated views and reporting. If you’ve created structured custom logs, CloudWatch Logs Insights can automatically discover the fields in your logs to make it easy to query and group your log data.
When a transaction fails, or completes slower than expected, how do you figure out where in the flow of services it failed? X-Ray gives you a visual representation of your services—a service map—that illustrates each integration point, and gives you quick insight into successes and failures. Then, you can drill down into the details of each individual trace.
You can enable X-Ray with one click for Lambda, API Gateway, and Amazon SNS. You can also turn it on for SQS queues that are not Lambda event sources, and you can add custom instrumentation to your function using the X-Ray SDK to write your own code. X-Ray integrations support both active and passive instrumentation.
You can add custom instrumentation to your function using the X-Ray SDK to write your own code. X-Ray integrations support both active and passive instrumentation:
|Service Integrations||Active Instrumentation||Passive Instrumentation|
|Samples and instruments incoming requests||Instruments requests that have been sampled by another service|
|Writes traces to X-Ray||Can add information to traces|
|Amazon API Gateway||✔️||✔️|
- CloudWatch metrics – To view how resources are performing, CloudWatch metrics is the best solution. If a developer needs to check how many times a Lambda function has been invoked,
- CloudWatch Logs Insights – CloudWatch Logs Insights enables you to interactively query your log data in CloudWatch Logs. If a team wants to search and query their logs for their API, CloudWatch Logs Insights would be the best option.
- CloudWatch Logs – You can insert logging statements into your code to help you validate that your code is working as expected. Lambda automatically integrates with CloudWatch Logs and pushes all logs from your code to CloudWatch. If an engineer wants to see what parameters are being passed into a function, they can insert logging statements in the code and check the response in CloudWatch Logs.
- X-Ray – X-Ray provides a visual map of successes and failures and lets you drill into individual traces for an execution and drill down into the details of how long each leg of the execution took.
- Records IAM user, IAM role, and AWS service API activity in your account.
- Is enabled when you create an account.
- Provides full details about the API action, like identity of the requestor, time of the API call, request parameters, and response elements returned by the service.
When activity occurs in your AWS account, that activity is recorded in a CloudTrail event, and you can see recent events in the event history.
The CloudTrail event history provides a viewable, searchable, and downloadable record of the past 90 days of CloudTrail events. Use this history to gain visibility into actions taken in your AWS account in the AWS Management Console, AWS SDKs, command line tools, and other AWS services.
A trail is a configuration that enables delivery of CloudTrail events to an Amazon S3 bucket, CloudWatch Logs, and CloudWatch Events. If you need to maintain a longer history of events, you can create your own trail. When you create a trail, it tracks events performed on or within resources in your AWS account and writes them to an S3 bucket you specify.
For example, a trail could capture modifications to your API Gateway APIs. You can optionally add data events to track S3 object-level API activity (like when someone uploads something to the bucket) or Lambda invoke API operations on one or all future Lambda functions in the account.
You can configure CloudTrail Insights on your trails to help you identify and respond to unusual activity associated with write API calls. CloudTrail Insights is a feature that tracks your normal patterns of API call volume and generates Insights events when the volume is outside normal patterns.