Rate Limiting là gì? Các thuật toán rate limiting phổ biến và cách implement?

Rate Limiting là kỹ thuật kiểm soát tần suất request từ một client/IP/user để bảo vệ hệ thống khỏi abuse, DDoS, và đảm bảo fair usage.

Các thuật toán:
- Token Bucket – bucket chứa tokens, mỗi request tiêu 1 token, tokens được refill theo rate cố định; cho phép burst ngắn.
- Leaky Bucket – requests được xử lý ở rate cố định; smooths out bursts nhưng không cho phép burst.
- Fixed Window Counter – đếm request trong window cố định (mỗi phút); đơn giản nhưng có boundary problem.
- Sliding Window Log – lưu timestamp của mỗi request, chính xác nhất nhưng tốn memory.
- Sliding Window Counter – kết hợp Fixed Window + sliding, cân bằng tốt.

Implementation: Redis với INCR + EXPIRE cho distributed rate limiting; Nginx module; API Gateway built-in (AWS API GW, Kong). Trả về 429 Too Many Requests với Retry-After header khi vượt limit.

Rate Limiting is the technique of controlling the frequency of requests from a client, IP, or user to protect the system from abuse, DDoS attacks, and to ensure fair usage.

Common algorithms: Token Bucket — a bucket holds tokens; each request consumes one token; tokens are refilled at a fixed rate; allows short bursts.
Leaky Bucket — requests are processed at a constant rate like water dripping through a hole; smooths out bursts but does not allow them.
Fixed Window Counter — counts requests within a fixed window (e.g., per minute); simple but has a boundary problem (spikes at the end and start of windows).
Sliding Window Log — stores the timestamp of each request; most accurate but memory-intensive.
Sliding Window Counter — combines fixed window and sliding approaches for a good balance.
Implementation: Redis with INCR + EXPIRE for distributed rate limiting; Nginx modules; built-in API Gateway features (AWS API GW, Kong).
Decide how to key rate limits: by IP, user_id, or API key; and what action to take when limits are exceeded: return 429 Too Many Requests with a Retry-After header.

Xem toàn bộ System Design cùng filter theo level & chủ đề con.

Mở danh sách System Design

Rate Limiting là kỹ thuật kiểm soát tần suất request từ một client/IP/user để bảo vệ hệ thống khỏi abuse, DDoS, và đảm bảo fair usage.

Rate Limiting is the technique of controlling the frequency of requests from a client, IP, or user to protect the system from abuse, DDoS attacks, and to ensure fair usage.

Common algorithms: Token Bucket — a bucket holds tokens; each request consumes one token; tokens are refilled at a fixed rate; allows short bursts.
Leaky Bucket — requests are processed at a constant rate like water dripping through a hole; smooths out bursts but does not allow them.
Fixed Window Counter — counts requests within a fixed window (e.g., per minute); simple but has a boundary problem (spikes at the end and start of windows).
Sliding Window Log — stores the timestamp of each request; most accurate but memory-intensive.
Sliding Window Counter — combines fixed window and sliding approaches for a good balance.
Implementation: Redis with INCR + EXPIRE for distributed rate limiting; Nginx modules; built-in API Gateway features (AWS API GW, Kong).
Decide how to key rate limits: by IP, user_id, or API key; and what action to take when limits are exceeded: return 429 Too Many Requests with a Retry-After header.