Amazon CloudWatch là observability service tập trung cho monitoring AWS resources và applications.
- CloudWatch Metrics: time-series data points (CPU, memory, request count, latency); AWS services tự động gửi metrics (vd: EC2 CPUUtilization mỗi 5 phút, hoặc 1 phút với detailed monitoring); custom metrics từ ứng dụng qua PutMetricData API hoặc CloudWatch agent ($0.30/metric/month).
- CloudWatch Logs: collect, store và search log data; Log Groups (container) → Log Streams (từng instance/function); Metric Filters extract metrics từ log patterns; Log Insights cho ad-hoc query logs bằng query language; export sang S3 cho long-term retention; Subscription Filters stream logs real-time tới Lambda/Kinesis/OpenSearch.
- CloudWatch Alarms: trigger khi metric vượt threshold; actions: SNS notification, EC2 Auto Scaling, EC2 action (stop/reboot/terminate); Composite Alarms kết hợp nhiều alarms; Anomaly Detection dùng ML để tự động detect anomaly.
- CloudWatch Dashboards: visualize metrics và logs trên single pane; cross-account/cross-region dashboards.
Best practices: enable detailed monitoring cho production EC2, set up alarms cho critical metrics (error rate > 1%, P99 latency > 2s, CPU > 80%), dùng EMF (Embedded Metric Format) để gửi structured metrics từ Lambda logs, tạo custom dashboard cho each service, set log retention policy để kiểm soát cost.
Amazon CloudWatch is the centralized observability service for monitoring AWS resources and applications.
- CloudWatch Metrics: time-series data points (CPU, memory, request count, latency); AWS services automatically emit metrics (e.g., EC2 CPUUtilization every 5 minutes, or 1 minute with detailed monitoring); custom metrics from applications via the PutMetricData API or CloudWatch agent ($0.30/metric/month).
- CloudWatch Logs: collects, stores, and searches log data; Log Groups (containers) → Log Streams (per instance/function); Metric Filters extract metrics from log patterns; Log Insights enables ad-hoc log queries; logs can be exported to S3 for long-term retention; Subscription Filters stream logs in real time to Lambda/Kinesis/OpenSearch.
- CloudWatch Alarms: trigger when a metric breaches a threshold; actions include SNS notifications, EC2 Auto Scaling, and EC2 actions (stop/reboot/terminate); Composite Alarms combine multiple alarms; Anomaly Detection uses ML to automatically identify anomalies.
- CloudWatch Dashboards: visualize metrics and logs on a single pane; supports cross-account/cross-region dashboards.
Best practices: enable detailed monitoring for production EC2, set up alarms for critical metrics (error rate > 1%, P99 latency > 2s, CPU > 80%), use EMF (Embedded Metric Format) to emit structured metrics from Lambda logs, create custom dashboards per service, and set log retention policies to control cost.