Requirements: user thấy posts từ người họ follow, realtime updates, pagination, ~500M users.
Core challenge: khi user A post, tất cả followers của A cần thấy post đó trong feed.
- Fanout-on-write (Push model): immediately push vào feed cache của tất cả followers → feed read rất nhanh, nhưng write amplification lớn: user có 1M followers → 1M cache writes.
- Fanout-on-read (Pull model): khi user load feed, query tất cả người họ follow, merge và sort → không có write overhead, nhưng read rất chậm và expensive.
- Hybrid approach (Facebook/Twitter): fanout-on-write cho users thường (< N followers), fanout-on-read cho celebrities (> N followers); merge pre-computed feed + real-time pull từ celebrities.
- Feed Storage: Redis sorted set với timestamp là score, post_id là member – ZREVRANGE để paginate; TTL để evict old feeds.
- Post storage: separate service, fetch post content từ DB/cache khi render feed.
- Ranking: chronological là đơn giản nhất; ML-based ranking (engagement prediction) phức tạp hơn nhưng giữ user lâu hơn.
Cursor-based pagination thay vì offset pagination để tránh missing/duplicate items khi feed thay đổi.
Requirements: users see posts from people they follow, real-time updates, pagination support, ~500M users.
Core challenge: when user A posts, all of A's followers need to see that post in their feed.
- Fanout-on-write (Push model): immediately push into the feed cache of all followers → very fast reads, but high write amplification: a user with 1M followers triggers 1M cache writes.
- Fanout-on-read (Pull model): when a user loads their feed, query all followed accounts, merge and sort → no write overhead, but reads are very slow and expensive.
- Hybrid approach (Facebook/Twitter): fanout-on-write for regular users (fewer than N followers); fanout-on-read for celebrities (more than N followers); merge the pre-computed feed with a real-time pull from celebrities.
- Feed storage: Redis sorted set with timestamp as the score and post_id as the member — ZREVRANGE for pagination; TTL to evict stale feeds.
- Post storage: a separate service fetches post content from the DB/cache when rendering the feed.
- Ranking: chronological order is simplest; ML-based ranking (engagement prediction) is more complex but increases user retention.
Use cursor-based pagination instead of offset pagination to avoid missing or duplicate items as the feed changes.