AWS Auto Scaling tự động điều chỉnh số lượng instances/containers/Lambda concurrency dựa trên demand. Auto Scaling Group (ASG) cho EC2 có 4 scaling policies:
(1) Target Tracking Scaling — đơn giản nhất, maintain metric ở target value (vd: giữ CPU ở 50%), ASG tự tính toán scale out/in; phù hợp hầu hết use case:
aws autoscaling put-scaling-policy --policy-type TargetTrackingScaling --target-tracking-configuration file://config.json(2) Step Scaling — define steps tùy theo alarm breach magnitude (vd: CPU 60-70% → add 1, 70-80% → add 2, >80% → add 4); linh hoạt hơn nhưng cần tuning.
(3) Scheduled Scaling — scale vào thời điểm biết trước (mỗi sáng thứ 2-6 scale up, cuối ngày scale down); tốt khi traffic pattern predictable theo lịch.
(4) Predictive Scaling — ML phân tích historical pattern, dự báo load và pre-scale trước khi traffic đến — tránh lag của reactive scaling; launch instances 5-6 phút trước dự báo tăng.
Cooldown period tránh thrashing (mặc định 300s). Scale-in protection cho instances đang xử lý long-running jobs. ECS Service Auto Scaling tương tự ASG. Lambda Provisioned Concurrency giải quyết cold start cho Lambda. Application Auto Scaling cover ECS, DynamoDB, Aurora, Kinesis, SageMaker. Luôn test scaling behavior với load testing trước production.
AWS Auto Scaling automatically adjusts the number of instances/containers/Lambda concurrency based on demand. Auto Scaling Groups (ASG) for EC2 support 4 scaling policies:
(1) Target Tracking Scaling — the simplest approach; maintains a metric at a target value (e.g., keep CPU at 50%), and the ASG calculates scale-out/in automatically; suitable for most use cases.
(2) Step Scaling — defines steps based on alarm breach magnitude (e.g., CPU 60-70% → add 1, 70-80% → add 2, >80% → add 4); more flexible but requires tuning.
(3) Scheduled Scaling — scales at known times (e.g., scale up every weekday morning, scale down in the evening); ideal when traffic patterns are predictable by schedule.
(4) Predictive Scaling — ML analyzes historical patterns, forecasts load, and pre-scales before traffic arrives — avoiding the lag of reactive scaling; instances are launched 5-6 minutes before a predicted increase.
Cooldown periods prevent thrashing (default 300s). Scale-in protection applies to instances processing long-running jobs. ECS Service Auto Scaling works similarly to ASG. Lambda Provisioned Concurrency addresses cold starts for Lambda. Application Auto Scaling covers ECS, DynamoDB, Aurora, Kinesis, and SageMaker. Always load-test scaling behavior before going to production.