Rebalancing xảy ra khi consumer group thay đổi (thêm/xóa consumer, consumer crash, hoặc subscription thay đổi).
- Trong quá trình rebalance, toàn bộ group dừng consume (stop-the-world), có thể gây latency spike.
- Vấn đề phổ biến:
session.timeout.msquá ngắn → consumer bị kick ra group vì GC pause dài;max.poll.interval.msquá ngắn → consumer xử lý lâu bị coi là dead. - Tối ưu: tăng
session.timeout.msvàheartbeat.interval.ms, giảmmax.poll.records, dùng incremental cooperative rebalancing (partition.assignment.strategy=CooperativeStickyAssignor) thay vì eager rebalancing — chỉ revoke partition thực sự cần chuyển, không dừng toàn bộ group. - Static group membership (
group.instance.id) giúp tránh rebalance khi restart consumer (reuse partition assignment cũ trong vòngsession.timeout.ms).
Rebalancing occurs when the consumer group changes (consumers added or removed, a consumer crashes, or subscriptions change).
- During a rebalance the entire group stops consuming (stop-the-world), potentially causing a latency spike.
- Common problems:
session.timeout.msset too low causes consumers to be ejected due to long GC pauses;max.poll.interval.msset too low causes slow consumers to be considered dead. - Optimizations: increase
session.timeout.msandheartbeat.interval.ms, reducemax.poll.records, and use incremental cooperative rebalancing (partition.assignment.strategy=CooperativeStickyAssignor) instead of eager rebalancing — only the partitions that truly need to move are revoked, avoiding a full group stop. - Static group membership (
group.instance.id) prevents rebalancing on consumer restarts by reusing the previous partition assignment withinsession.timeout.ms.