Requirements: tạo short URL từ long URL, redirect từ short → long, ~100M URLs/day (write), ~1B redirects/day (read, read-heavy 10:1).
- Hash Generation: dùng Base62 encoding (a-zA-Z0-9) trên 7 ký tự = 62^7 ≈ 3.5 nghìn tỷ unique URLs; tránh MD5/SHA vì collision; thay vào đó dùng auto-increment ID convert sang Base62.
- Database: lưu
short_code → long_urlmapping; read-heavy nên cần caching aggressive; có thể dùng Cassandra (scale tốt) hoặc MySQL/PostgreSQL với Redis cache. - Cache: 80% traffic chỉ đến 20% URLs (hot URLs) → cache top URLs trong Redis với LRU eviction, cache hit rate rất cao.
- Redirect: 301 (permanent, browser cache – ít load server nhưng không track analytics) vs 302 (temporary, browser không cache – track được mỗi click).
Architecture: API Server stateless → Redis cache → Database; Rate limiting để tránh abuse; Custom domain support cần DNS wildcard; Analytics pipeline: click → Kafka → Spark → analytics DB. Scale: phân tách read service (redirect) và write service (create) vì load pattern khác nhau.
Requirements: generate a short URL from a long URL, redirect from short to long, ~100M URLs/day (write), ~1B redirects/day (read — heavily read-biased at 10:1 ratio).
- Hash Generation: use Base62 encoding (a-zA-Z0-9) on 7 characters = 62^7 ≈ 3.5 trillion unique URLs; avoid MD5/SHA due to collision risk; instead, use an auto-incrementing ID converted to Base62.
- Database: stores the
short_code → long_urlmapping; since reads dominate, aggressive caching is needed; Cassandra (scales well) or MySQL/PostgreSQL with a Redis cache are good options. - Cache: 80% of traffic hits 20% of URLs (hot URLs) → cache top URLs in Redis with LRU eviction for very high cache hit rates.
- Redirect: 301 (permanent — browser caches it, reduces server load but cannot track analytics) vs 302 (temporary — browser does not cache, each click is trackable).
Architecture: stateless API Server → Redis cache → Database; rate limiting to prevent abuse; custom domain support requires a DNS wildcard; analytics pipeline: click → Kafka → Spark → analytics DB. Scaling: separate the read service (redirect) from the write service (create) since their load patterns differ significantly.