AWS Monitoring

In this lesson, we'll explore different types of metrics used to monitor various AWS services like Amazon S3, Amazon RDS, and Amazon EC2. Understanding these metrics is crucial for maintaining system performance, reliability, and security.

Welcome to Lesson 10: Monitoring. In this lesson, we’ll explore the different types of metrics used to monitor various AWS services like Amazon S3, Amazon RDS, and Amazon EC2. Understanding these metrics is crucial for maintaining system performance, reliability, and security.

Types of metrics:

  • Amazon S3 metrics
    • size of objects
    • number of objects
    • number of http req made to bucket
  • Amazon RDS metrics:
    • db connections
    • CPU utilization of an instance
    • disk space consumption
  • Amazon EC2 metrics:
    • CPU utilization
    • network utilization
    • disk performance
    • status checks

Monitoring benefits:

  • allow to respond proactively
  • improve performance and reliability – you know how system behaves, where are potential bottlenecks
  • allow to recognise security threats and events
  • make data-driven decisions
  • create cost-effective solutions

CloudWatch:

  • monitoring and observability service that collects your resource data
  • you can use it to:
    • detect anomalous behavior in your environments
    • set alarms to alert you when something is not right
    • visualize logs and metrics with the AWS Management Console
    • take automated actions like scaling
    • troubleshoot issues
    • discover insights to keep your applications healthy
  • Many AWS services automatically send metrics to CloudWatch *for free at a rate of 1 data point per metric per 5-minute interval (this is called basic monitoring)
  • for applications running on EC2 instances, you can get more granularity by posting metrics every minute instead of every 5-minutes (detailed monitoring). It incurs a fee
  • AWS services that send data to CloudWatch attach dimensions to each metric
  • for a charge, you can publish your own application metrics on resources such as your EC2 instances

Custom metrics:

  • with custom metrics, you can publish your own metrics to CloudWatch.
  • you can use high-resolution custom metrics, which make it possible for you to collect custom metrics down to a 1-second resolution
  • example custom metric: Webpage load times, request error rates, number of processes or threads on your instance

CloudWatch dashboards:

  • customizable home pages you can configure for data visualization
  • you can use external or custom tools to ingest and analyze CloudWatch metrics using the GetMetricData API
  • with IAM policies, you control who has access to view or manage your CloudWatch dashboards

Amazon CloudWatch Logs:

  • centralized place for logs to be stored and analyzed
  • you can query and filter your log data
  • some services, like Lambda, are set up to send log data to CloudWatch Logs with minimal effort (all you need to do is give the Lambda function the correct IAM permissions to post logs to CloudWatch Logs)
  • to send your application logs from an EC2 instance into CloudWatch Logs, you need to install and configure the CloudWatch Logs agent on the EC2 instance

CloudWatch Logs terminology:

  • Log event – record of activity with timestamp and event message
  • Log stream – sequences of log events that belong to the same resource being monitored (e.g. EC2 instance)
  • Log group – log streams that have the same retention and permissions settings (e.g. log streams from ec2 instances)

CloudWatch alarms:

  • automatically initiate actions based on sustained state changes of your metrics
  • you choose metric, threshold and threshold’s time period
  • after an alarm is invoked, it can initiate an action (e.g. automatic scaling action, notification sent to Amazon Simple Notification Service (Amazon SNS))
  • three possible states of an alarm:
    • OK – metrics within the threshold
    • ALARM – metric is outside the defined threshold
    • INSUFFICIENT_DATA – alarm has jus started, the metric is not available or not enough data is available

Solution Optimization:

  • availability – typically expressed as a percentage of uptime in a given year or as a number of nines
  • 99.999% (five nines of availability) – 5.26 minutes downtime per year
  • to increase availability, you need redundancy

Adding a second Availability Zone:

  • physical location of a server is important – you must consider hardware issues
  • to remedy the physical location issue, you can deploy a second EC2 instance in a second Availability Zone
  • when there is more than one instance, it brings new challenges, such as the following:
    • Replication process – replicate the configuration files, software patches, and application across instances
    • Customer redirection – the most common is using a DNS where the client uses one record that points to the IP address of all available servers. However, this method isn’t always used because of propagation time
    • Types of high availability:
      • active-passive – only one of 2 instances is available at a time. Useful for stateful applications, because customers are always sent to the server where their session id is stored
      • active-active – both servers are available, which results in better scalability, but there will be issues with stateful apps

Tools for monitoring costs in ASW:

  • AWS Cost Explorer – daily and monthly granularity, data from the last 12 months
  • AWS Budgets – we can set a budget and alarms when it exceeds it, or it is predicted that it will be exceeded
  • AWS Cost and Usage Report – the most detailed cost data
  • Cost Optimization Monitor – the entire architecture that collects, analyzes and presents cost data

Monitoring your AWS resources is essential for proactive management, performance optimization, and cost control. By leveraging the power of Amazon CloudWatch, you can gain valuable insights, automate responses to potential issues, and ensure your applications are running efficiently. Remember to regularly review and adjust your monitoring strategies to stay ahead of any challenges and keep your system at its best.

Share this article!