Requirements: multi-channel (push, email, SMS), high volume (hàng trăm triệu notifications/day), reliable delivery, user preferences.
Architecture: Producer Services → Notification Service → Channel Handlers → Third-party providers.
- Notification Service: nhận events (order shipped, friend request), lookup user preferences (channel, quiet hours, opt-out), enqueue vào Kafka với separate topics per channel.
- Channel Workers: Push Notification Worker gọi FCM (Android)/APNs (iOS); Email Worker gọi SendGrid/SES; SMS Worker gọi Twilio/SNS.
- Reliability: at-least-once delivery via Kafka; lưu notification vào DB với status (pending/sent/failed); retry with exponential backoff; dead letter queue cho permanent failures.
- Rate limiting per user: không spam user với 100 notifications cùng lúc – aggregate/throttle.
- Notification template service: versioned templates với i18n support.
- User preference service: per-channel opt-in/out, quiet hours, digest mode.
- Monitoring: delivery rate per channel, bounce/unsubscribe tracking, latency P99.
Scale bottleneck thường ở third-party API calls – cần circuit breakers và fallback providers. Idempotency key để tránh duplicate notifications khi retry.
Requirements: multi-channel (push, email, SMS), high volume (hundreds of millions of notifications per day), reliable delivery, user preference support.
Architecture: Producer Services → Notification Service → Channel Handlers → Third-party providers.
- Notification Service: receives events (order shipped, friend request), looks up user preferences (channel, quiet hours, opt-out), and enqueues into Kafka with separate topics per channel.
- Channel Workers: Push Notification Worker calls FCM (Android) / APNs (iOS); Email Worker calls SendGrid/SES; SMS Worker calls Twilio/SNS.
- Reliability: at-least-once delivery via Kafka; store notifications in the DB with status (pending/sent/failed); retry with exponential backoff; dead letter queue for permanent failures.
- Rate limiting per user: avoid spamming users with 100 notifications at once — aggregate or throttle.
- Notification template service: versioned templates with i18n support.
- User preference service: per-channel opt-in/out, quiet hours, and digest mode.
- Monitoring: delivery rate per channel, bounce/unsubscribe tracking, P99 latency.
The typical scale bottleneck is third-party API calls — circuit breakers and fallback providers are essential. Use idempotency keys to avoid duplicate notifications during retries.