31 câu hỏi phỏng vấn Kafka có đáp án

#1Apache Kafka là gì và tại sao nó được sử dụng phổ biến trong hệ thống phân tán?

Cơ Bản

Kafka là distributed event streaming platform cho phép xử lý dữ liệu real-time với throughput cao, latency thấp và khả năng replay message — lý do chính khiến nó phổ biến trong hệ thống phân tán.

Apache Kafka được LinkedIn phát triển và open-source năm 2011.
Kafka được thiết kế để xử lý luồng dữ liệu thời gian thực với thông lượng cực cao, độ trễ thấp và khả năng mở rộng theo chiều ngang.
Điểm mạnh của Kafka so với các message broker truyền thống là khả năng lưu trữ message lâu dài trên disk (không xóa sau khi consumer đọc), cho phép nhiều consumer đọc lại cùng một message.
Kafka thường được dùng trong các use case như event sourcing, log aggregation, real-time analytics, data pipeline giữa các microservices, và change data capture (CDC) từ database.

#2Giải thích kiến trúc Kafka: Broker, Topic, Partition, Consumer Group hoạt động như thế nào?

Cơ Bản

Kafka cluster gồm nhiều Broker lưu Topic, mỗi Topic chia thành Partition để song song hoá, và Consumer Group để scale đọc. Broker là một server Kafka chạy độc lập; một Kafka cluster thường có nhiều broker để đảm bảo high availability. Topic là kênh logic để phân loại message (ví dụ: topic 'orders', topic 'payments'). Mỗi topic được chia thành nhiều Partition — đây là đơn vị song song hóa của Kafka; các message trong một partition được sắp xếp theo thứ tự và mỗi message có một offset duy nhất. Consumer Group là nhóm các consumer cùng nhau đọc một topic; mỗi partition chỉ được đọc bởi đúng một consumer trong group tại một thời điểm, cho phép scale out việc tiêu thụ message.

Ví dụ: nếu topic có 6 partition và group có 3 consumer, mỗi consumer xử lý 2 partition.

#3Producer và Consumer trong Kafka hoạt động như thế nào? Các cấu hình quan trọng cần biết?

Cơ Bản

Producer gửi message đến một topic và Kafka tự động phân phối message vào các partition (theo key hash, round-robin, hoặc custom partitioner).

Cấu hình quan trọng của producer: acks (0=fire-and-forget, 1=leader ack, all=tất cả ISR ack), retries, batch.size và linger.ms để tối ưu throughput.
Consumer đọc message từ partition theo offset và có thể commit offset tự động (enable.auto.commit=true) hoặc thủ công.
Cấu hình quan trọng của consumer: auto.offset.reset (earliest/latest), max.poll.records, session.timeout.ms.
Trong thực tế, nên dùng manual commit để tránh mất message khi consumer crash trước khi xử lý xong.

#4Kafka Connect là gì? Nó giải quyết bài toán gì trong data pipeline?

Cơ Bản

Kafka Connect là framework tích hợp sẵn trong Kafka ecosystem để kết nối Kafka với các external system (database, file system, cloud storage, search engine) mà không cần viết code.

Kafka Connect có hai loại connector: Source Connector (đọc data từ external system vào Kafka, ví dụ: Debezium CDC từ PostgreSQL) và Sink Connector (ghi data từ Kafka ra external system, ví dụ: Elasticsearch Sink).
Connect chạy ở chế độ distributed với worker pool, tự động handle fault tolerance và load balancing.

Ví dụ thực tế: dùng Debezium Source Connector để capture mọi thay đổi trong MySQL database, publish vào Kafka topic, sau đó Elasticsearch Sink Connector index data để search — toàn bộ pipeline không cần viết một dòng code custom nào.

#5Các use case phổ biến nhất của Kafka trong thực tế là gì?

Cơ Bản

Kafka được dùng rộng rãi cho log aggregation, event sourcing, CDC, và microservices communication nhờ khả năng lưu trữ và replay message.

Các use case phổ biến của Kafka trong production:

Log aggregation: tập hợp log từ nhiều service vào một nơi, sau đó forward sang Elasticsearch hoặc S3 để phân tích.
Event sourcing: lưu tất cả sự kiện (thay vì chỉ state hiện tại) để rebuild state bất kỳ lúc nào.
Real-time analytics: stream data từ user activity vào Kafka, xử lý với Kafka Streams hoặc Flink để cập nhật dashboard real-time.
Microservices communication: thay vì gọi API trực tiếp (tight coupling), các service publish event và service khác subscribe — giảm coupling và tăng resilience.
Change Data Capture (CDC): dùng Debezium để capture mọi thay đổi từ database, sync sang data warehouse hoặc invalidate cache.
Fraud detection: stream giao dịch tài chính qua Kafka, dùng ML model để detect bất thường trong real-time.

#6Offset management trong Kafka là gì? Phân biệt auto commit và manual commit?

Trung Bình

Offset là số thứ tự của message trong một partition, bắt đầu từ 0.

Kafka lưu offset của consumer vào một internal topic tên __consumer_offsets.
Auto commit (enable.auto.commit=true) sẽ tự động commit offset theo chu kỳ auto.commit.interval.ms (mặc định 5000ms), nhưng có thể dẫn đến mất message nếu consumer crash SAU khi auto-commit nhưng TRƯỚC khi xử lý xong message đó, hoặc xử lý trùng nếu consumer crash sau khi xử lý nhưng trước khi auto commit.
Manual commit (commitSync() hoặc commitAsync()) cho phép kiểm soát chính xác: chỉ commit sau khi xử lý thành công. commitSync() block cho đến khi commit thành công, commitAsync() không block nhưng cần callback để xử lý lỗi.
Trong production, nên dùng manual commit kết hợp với idempotent processing để đảm bảo at-least-once hoặc exactly-once semantics.

#7Message ordering trong Kafka được đảm bảo như thế nào? Khi nào ordering bị phá vỡ?

Trung Bình

Kafka đảm bảo thứ tự message trong phạm vi một partition — các message được ghi và đọc theo đúng thứ tự FIFO.

Tuy nhiên, không có đảm bảo thứ tự giữa các partition khác nhau.
Để đảm bảo ordering cho một nhóm message liên quan (ví dụ: tất cả sự kiện của một user), cần gán cùng một message key — Kafka sẽ hash key và luôn route message có cùng key vào cùng partition.
Ordering có thể bị phá vỡ khi: producer bật retries > 0 và max.in.flight.requests.per.connection > 1 (message sau có thể arrive trước message trước bị retry) — giải pháp là bật enable.idempotence=true.
Ngoài ra, rebalancing consumer group không ảnh hưởng đến thứ tự trong partition vì offset được duy trì.

#8Replication và ISR (In-Sync Replicas) trong Kafka là gì? Cách Kafka đảm bảo fault tolerance?

Trung Bình

Mỗi partition có một leader và nhiều follower replica trên các broker khác nhau.

Leader xử lý tất cả read/write, follower chủ động pull data từ leader để sync.
ISR (In-Sync Replicas) là tập hợp các replica đang sync kịp với leader (không bị lag quá replica.lag.time.max.ms).
Khi leader broker bị lỗi, một replica trong ISR sẽ được bầu làm leader mới.
Cấu hình replication.factor=3 và min.insync.replicas=2 kết hợp với acks=all đảm bảo message chỉ được acknowledge khi ít nhất 2 replica đã ghi — bảo vệ khỏi mất data kể cả khi 1 broker bị lỗi.
Trong production nên dùng replication.factor >= 3 để chịu được lỗi của 2 broker đồng thời.

#9Kafka Streams là gì? Khác gì so với việc consume message thông thường?

Trung Bình

Kafka Streams là thư viện Java/Scala để xây dựng ứng dụng stream processing trực tiếp trên Kafka, không cần external cluster như Spark hay Flink. Kafka Streams cung cấp các operation high-level như filter, map, groupBy, aggregate, join giữa các stream. Điểm khác biệt: consumer thông thường chỉ đọc và xử lý message, còn Kafka Streams có khái niệm KStream (unbounded stream of events) và KTable (changelog stream, represents current state), cho phép stateful processing với local state store (RocksDB). Kafka Streams tự động handle partitioning, scaling, và fault recovery — khi thêm instance, Kafka tự rebalance partition.

Ví dụ: tính tổng doanh thu mỗi 5 phút từ stream order events, join stream clicks với stream purchases để tính conversion rate.

#10Idempotent producer trong Kafka là gì? Cách bật và hoạt động như thế nào?

Trung Bình

Idempotent producer đảm bảo rằng dù producer retry gửi message bao nhiêu lần (do network error, timeout), mỗi message chỉ được ghi đúng một lần vào partition. Cách hoạt động: Kafka assign cho mỗi producer một Producer ID (PID) duy nhất và mỗi message có sequence number tăng dần per-partition. Broker reject nếu thấy message có sequence number đã nhận hoặc không liên tiếp. Bật bằng cách set enable.idempotence=true — tự động set acks=all, max.in.flight.requests.per.connection=5, retries=Integer.MAX_VALUE. Idempotence chỉ trong một producer session — nếu producer restart, PID mới và sequence reset. Để exactly-once across sessions và multiple partitions, cần Kafka Transactions.

Ví dụ:

props.put("enable.idempotence", "true");
props.put("transactional.id", "my-tx-id"); // cho transactions

#11Producer batching trong Kafka: linger.ms và batch.size hoạt động thế nào để tối ưu throughput?

Trung Bình

Mặc định Kafka producer gửi message ngay lập tức (linger.ms=0), nhưng cách này kém hiệu quả khi throughput cao vì mỗi request chứa ít record. batch.size (default 16384 bytes = 16KB) là kích thước tối đa của một batch per-partition — khi batch đầy, gửi ngay. linger.ms (default 0ms) là thời gian producer chờ thêm message trước khi gửi dù batch chưa đầy — tương tự Nagle's algorithm của TCP. Tuning throughput: tăng batch.size lên 128KB–1MB, thêm linger.ms=10–50ms để gom nhiều message hơn, bật compression.type=lz4 hoặc snappy. Kết quả: throughput tăng 3-5x, latency tăng nhẹ (chấp nhận được cho non-realtime). Với low-latency use case (trading, alerting) thì giữ linger.ms=0.

Trade-off: linger.ms cao → throughput tốt, latency cao; linger.ms=0 → latency thấp nhất, throughput thấp hơn.

#12At-least-once vs at-most-once vs exactly-once trong Kafka: phân biệt và cách đạt được mỗi loại?

Trung Bình

At-most-once (fire-and-forget): producer acks=0, consumer commit offset trước khi xử lý — message có thể mất, không bao giờ duplicate.

Dùng cho metrics/log không cần chính xác tuyệt đối. At-least-once: producer acks=all + retries, consumer commit sau khi xử lý thành công — không mất message nhưng có thể xử lý trùng (duplicate) khi consumer crash sau xử lý nhưng trước commit.
Yêu cầu consumer logic phải idempotent. Exactly-once: khó nhất, cần cả producer và consumer phối hợp.
Producer: bật enable.idempotence=true + Kafka Transactions (transactional.id).
Consumer: isolation.level=read_committed + đặt offset commit trong cùng transaction với business logic (nếu write ra DB phải là transactional outbox).
Kafka Streams với processing.guarantee=exactly_once_v2 tự handle toàn bộ.
Exactly-once có overhead ~20-30% latency — chỉ dùng cho billing, financial transactions.

#13Consumer group và partition assignment: cách Kafka phân phối partition cho consumers?

Trung Bình

Kafka Group Coordinator (broker) quản lý consumer group lifecycle và partition assignment.

Khi consumer join group, Group Coordinator trigger rebalance và chọn một consumer làm Group Leader — Leader thực hiện partition assignment theo strategy đã cấu hình và gửi kết quả về Coordinator. RangeAssignor (default): assign partition liên tiếp theo topic — consumer 0 nhận partition 0,1; consumer 1 nhận 2,3.
Nếu partition không chia đều, consumer đầu tiên nhận nhiều hơn (uneven distribution across topics). RoundRobinAssignor: phân phối đều hơn bằng cách xen kẽ — mỗi consumer nhận partition luân phiên. StickyAssignor: cố gắng minimize sự thay đổi so với assignment trước. CooperativeStickyAssignor: như Sticky nhưng dùng cooperative protocol (không stop-the-world) — khuyến nghị cho ứng dụng mới theo KIP-429.
Thực tế: số consumer không nên vượt số partition — consumer thừa sẽ idle.
Khi scale out, thêm partition TRƯỚC khi thêm consumer để tránh idle consumers.

#14Partition key design trong Kafka: cách chọn key để tránh hotspot và đảm bảo ordering?

Trung Bình

Partition key quyết định message vào partition nào (hash(key) % numPartitions).

Chọn key sai dẫn đến: hotspot (một partition bị overload), hoặc mất ordering (message liên quan vào partition khác nhau). Nguyên tắc chọn key: chọn field có cardinality cao và phân phối đều — user_id, order_id, device_id tốt hơn country_code hay status.
Key phải là field mà ordering quan trọng trong context của nó: tất cả event của cùng order_id phải theo thứ tự → dùng order_id làm key. Tránh hotspot: nếu key phân phối không đều (một số user_id hoạt động cực nhiều), thêm random suffix: user_id + '-' + random(0, N) — nhưng sẽ mất ordering global per user.
Kỹ thuật khác: key salting key + timestamp_bucket — partition thay đổi theo thời gian.
Tăng số partition cũng giúp giảm hotspot nhưng không giải quyết root cause nếu key quá skewed.

#15Exactly-once semantics (EOS) trong Kafka hoạt động như thế nào? Giải thích idempotent producers và transactions.

Nâng Cao

Kafka hỗ trợ 3 delivery semantics: at-most-once (có thể mất message), at-least-once (có thể duplicate), và exactly-once. Idempotent producer (enable.idempotence=true) đảm bảo mỗi message chỉ được ghi đúng một lần vào một partition kể cả khi retry — producer được gán PID và mỗi message có sequence number, broker reject duplicate. Kafka Transactions cho phép atomic write across multiple partitions/topics: producer dùng initTransactions(), beginTransaction(), write messages, commitTransaction() hoặc abortTransaction(). Consumer dùng isolation.level=read_committed để chỉ đọc message từ committed transaction. EOS trong Kafka Streams được bật bằng processing.guarantee=exactly_once_v2.

Lưu ý: EOS có overhead về latency và throughput (~20-30%), chỉ dùng khi thực sự cần thiết (tài chính, billing).

#16Các chiến lược partition trong Kafka: khi nào dùng key-based, round-robin, hay custom partitioner?

Nâng Cao

Round-robin (khi message key là null): phân phối đều message sang các partition, tối ưu throughput nhưng không đảm bảo ordering.

Key-based partitioning (hash của key % numPartitions): đảm bảo tất cả message cùng key vào một partition — bắt buộc khi cần ordering (ví dụ: tất cả event của user_id=123 phải theo thứ tự).
Vấn đề hotspot: nếu key phân phối không đều (ví dụ: một số user_id hoạt động cực kỳ nhiều), một số partition sẽ overloaded — giải pháp là thêm random suffix vào key hoặc dùng custom partitioner với logic phức tạp hơn.
Custom partitioner cho phép route theo business logic (ví dụ: VIP customer vào partition riêng để ưu tiên xử lý).
Khi tăng số partition của existing topic, message key cũ có thể bị route sang partition khác — cần planning kỹ trước khi deploy.

#17Consumer rebalancing trong Kafka: vấn đề gì có thể xảy ra và cách tối ưu?

Nâng Cao

Rebalancing xảy ra khi consumer group thay đổi (thêm/xóa consumer, consumer crash, hoặc subscription thay đổi).

Trong quá trình rebalance, toàn bộ group dừng consume (stop-the-world), có thể gây latency spike.
Vấn đề phổ biến: session.timeout.ms quá ngắn → consumer bị kick ra group vì GC pause dài; max.poll.interval.ms quá ngắn → consumer xử lý lâu bị coi là dead.
Tối ưu: tăng session.timeout.ms và heartbeat.interval.ms, giảm max.poll.records, dùng incremental cooperative rebalancing (partition.assignment.strategy=CooperativeStickyAssignor) thay vì eager rebalancing — chỉ revoke partition thực sự cần chuyển, không dừng toàn bộ group.
Static group membership (group.instance.id) giúp tránh rebalance khi restart consumer (reuse partition assignment cũ trong vòng session.timeout.ms).

#18Dead Letter Queue (DLQ) trong Kafka là gì? Cách implement xử lý message lỗi?

Nâng Cao

Kafka không có DLQ built-in như RabbitMQ, nhưng pattern DLQ được implement bằng cách: khi consumer fail xử lý một message sau N lần retry, thay vì block toàn bộ partition, message được forward sang một topic DLQ riêng (ví dụ: orders.DLT) kèm theo metadata (exception, timestamp, original topic, partition, offset).

Kafka có Spring Kafka @RetryableTopic và @DltHandler để implement pattern này tự động với exponential backoff.
Cần cân nhắc: nếu dừng xử lý để retry, các message sau sẽ bị delay (ordering preserved nhưng throughput giảm); nếu skip và gửi DLQ, ordering bị phá vỡ nhưng throughput không bị ảnh hưởng.
Monitoring DLQ là critical — nên alert khi DLQ có message, có team xử lý manual hoặc replay sau khi fix bug.
Một best practice khác là dùng retry topic với tên topic.RETRY-1, topic.RETRY-2 với delay tăng dần.

#19Monitoring và performance tuning Kafka: các metrics quan trọng và cách tối ưu throughput/latency?

Nâng Cao

Metrics quan trọng cần monitor: UnderReplicatedPartitions (>0 là dấu hiệu vấn đề replication), OfflinePartitionsCount (cần alert ngay khi >0), BytesInPerSec/BytesOutPerSec (throughput), RequestHandlerAvgIdlePercent (<0.2 là broker overloaded), consumer lag (records-lag-max) để detect consumer chậm.

Tối ưu throughput producer: tăng batch.size (16KB→128KB), thêm linger.ms (0→20ms), bật compression.type=lz4 giảm network I/O.
Tối ưu throughput consumer: tăng fetch.min.bytes và fetch.max.wait.ms để fetch theo batch lớn, tăng max.poll.records.
Tối ưu broker: tăng số thread I/O (num.io.threads), dùng dedicated disk cho Kafka log (tránh share với OS), đặt log.dirs trên multiple disk để parallel I/O.
Dùng Kafka Exporter + Prometheus + Grafana cho observability stack.

#20Kafka trong kiến trúc microservices: event-driven architecture, saga pattern, và các pitfall cần tránh?

Nâng Cao

Kafka là backbone của event-driven microservices: service A publish event (OrderCreated), service B và C subscribe để thực hiện action tương ứng (PaymentService charge, InventoryService deduct).

Saga pattern dùng Kafka để coordinate distributed transaction: choreography-based saga — mỗi service lắng nghe event và publish event tiếp theo, không có central coordinator; orchestration-based saga — Saga Orchestrator gửi command đến từng service qua Kafka và lắng nghe response.

Pitfall cần tránh: event schema thay đổi breaking consumer (dùng Schema Registry + Avro để enforce backward/forward compatibility); consumer idempotency — cùng event có thể arrive nhiều lần, mọi handler phải idempotent; transaction outbox pattern để đảm bảo atomicity giữa database write và Kafka publish (tránh case: DB write thành công nhưng Kafka publish fail hoặc ngược lại).

#21Schema Registry trong Kafka ecosystem là gì? Tại sao cần dùng Avro/Protobuf thay vì JSON?

Nâng Cao

Schema Registry (Confluent) là service lưu trữ và quản lý schema của Kafka message, đảm bảo producer và consumer đồng thuận về format dữ liệu.

JSON không có schema enforcement — producer có thể thay đổi field name/type mà không báo, khiến consumer bị lỗi.
Avro và Protobuf là binary serialization formats: nhỏ hơn JSON ~3-10x (giảm storage và network cost), có schema evolution với compatibility rules (BACKWARD, FORWARD, FULL).
Schema Registry lưu schema theo subject (mặc định là topic name), assign schema ID; message chỉ chứa schema ID (4 bytes) thay vì full schema, consumer lookup schema từ registry khi cần.
Khi thêm field mới với default value (BACKWARD compatible), consumer cũ vẫn deserialize được; xóa field hoặc đổi type là BREAKING change bị Schema Registry reject nếu cấu hình không cho phép.
Đây là best practice bắt buộc trong production Kafka cluster.

#22KRaft mode trong Kafka là gì? Tại sao Kafka loại bỏ ZooKeeper?

Nâng Cao

Trước Kafka 3.3, ZooKeeper được dùng để manage cluster metadata (broker registration, partition leader election, topic config). ZooKeeper là một external dependency phức tạp, tạo ra operational overhead (phải manage thêm một cluster), giới hạn số partition (~200k partition/cluster vì ZooKeeper bottleneck), và có vấn đề về split-brain scenario. KRaft (Kafka Raft Metadata mode) thay thế ZooKeeper bằng cách dùng Raft consensus protocol ngay trong Kafka cluster — một số broker được bầu làm controller voter.

Lợi ích: đơn giản hóa deployment (không cần ZooKeeper), tăng giới hạn partition lên hàng triệu, metadata propagation nhanh hơn, startup/failover time giảm từ phút xuống giây. KRaft trở nên production-ready từ Kafka 3.3 (phát hành 10/2022). Kafka 4.0 (phát hành 3/2025) đã hoàn toàn loại bỏ ZooKeeper — mọi cluster mới phải dùng KRaft. Trong production mới nên bắt đầu với KRaft mode.

#23Kafka consumer lag là gì? Cách monitor và alert khi lag quá cao?

Nâng Cao

Consumer lag = (Log End Offset của partition) - (Current Offset của consumer group) = số message consumer chưa xử lý.

Lag cao đồng nghĩa consumer đang xử lý chậm hơn producer ghi vào, có thể dẫn đến message bị expire (nếu log.retention.hours ngắn) hoặc data trễ quá mức chấp nhận.
Monitor lag:

bash

# Kafka CLI
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group my-group

# Kết quả: TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG

Prometheus stack: Kafka Exporter (kafka_consumergroup_lag) + Grafana dashboard.
Alert khi: lag vượt threshold (ví dụ > 10,000 records), lag tăng liên tục qua nhiều scrape interval.
Nguyên nhân lag cao: consumer xử lý chậm (heavy business logic, DB call), GC pause, rebalancing, downstream bottleneck.
Giải pháp: tăng số consumer (nếu còn partition chưa được assign), optimize processing logic, tăng max.poll.records, cân nhắc async processing.

#24Log compaction trong Kafka là gì? Tombstone record dùng để làm gì?

Nâng Cao

Log compaction là cơ chế Kafka chỉ giữ lại message mới nhất cho mỗi message key, xóa các phiên bản cũ — thay vì xóa theo thời gian (log.retention.hours).

Cấu hình: cleanup.policy=compact. Ứng dụng: KTable trong Kafka Streams (compacted topic lưu state mới nhất), changelog topic của Kafka Connect, event sourcing snapshot (chỉ cần state cuối). Tombstone record: message có key nhưng value=null — báo hiệu cho log compaction xóa tất cả message có key đó khỏi compacted log.
Dùng để xóa data khỏi KTable hoặc downstream systems (GDPR right-to-erasure).
Log compaction không chạy real-time: log.cleaner.min.cleanable.ratio và log.cleaner.backoff.ms control tần suất.
Active segment (segment đang ghi) không bao giờ bị compact — chỉ inactive segments.
Consumer đọc compacted topic có thể thấy cả message cũ và mới trong một scrub cycle.

#25min.insync.replicas và acks=all kết hợp bảo vệ data như thế nào? Khi nào producer throw NotEnoughReplicasException?

Nâng Cao

acks=all yêu cầu leader đợi tất cả ISR (In-Sync Replicas) ghi xong mới acknowledge producer. min.insync.replicas (broker/topic config, default=1) quy định số replica tối thiểu trong ISR để accept write. Kết hợp tối ưu: replication.factor=3, min.insync.replicas=2, acks=all → message chỉ acknowledged khi ít nhất 2 replica ghi xong → chịu được 1 broker failure mà không mất data. NotEnoughReplicasException xảy ra khi ISR size < min.insync.replicas — ví dụ 2 trong 3 broker down, ISR chỉ còn 1 nhưng min.insync.replicas=2 → producer bị reject.

Đây là trade-off giữa availability và durability.
Nếu muốn availability hơn durability: giảm min.insync.replicas=1 (có thể mất data nếu leader fail trước follower sync).
Trong production tài chính: replication.factor=3, min.insync.replicas=2, acks=all là cấu hình bắt buộc.

#26MirrorMaker 2 là gì? Cách thực hiện cross-cluster replication trong Kafka?

Nâng Cao

MirrorMaker 2 (MM2) là công cụ replication chính thức của Kafka để sao chép data giữa các Kafka cluster (active-passive DR, active-active multi-region, migration).

MM2 xây trên Kafka Connect framework: dùng MirrorSourceConnector (replicate messages), MirrorCheckpointConnector (replicate consumer offsets), MirrorHeartbeatConnector (monitor replication lag).
Topic được replicate với prefix: topic orders trên cluster A trở thành clusterA.orders trên cluster B — tránh naming conflict.
Offset translation: MM2 translate consumer offsets giữa cluster để consumer có thể failover sang cluster B và tiếp tục từ đúng vị trí.
Config cơ bản:

properties

clusters = source, target
source.bootstrap.servers = source:9092
target.bootstrap.servers = target:9092
source->target.enabled = true
source->target.topics = .*

Alternative: Confluent Replicator (commercial), hoặc dùng Kafka application-level replication (consumer từ source, produce sang target).

#27ksqlDB là gì? Khác gì so với Kafka Streams?

Nâng Cao

ksqlDB là engine SQL-based stream processing chạy trên Kafka, cho phép query và transform Kafka topics bằng SQL syntax — không cần viết Java/Scala. Phù hợp cho data engineers và analysts cần real-time analytics nhanh. Kafka Streams là thư viện Java — cần viết code, compile, deploy như một microservice — linh hoạt hơn, production-grade hơn cho complex logic. So sánh:
- ksqlDB: CREATE STREAM orders_by_user AS SELECT user_id, COUNT(*) FROM orders GROUP BY user_id EMIT CHANGES; — zero-code deployment
- Kafka Streams: code Java với KStream/KTable API, test với TopologyTestDriver

KsqlDB chạy trên ksqlDB Server cluster riêng (không phải Kafka broker). Pull queries (point-in-time query from materialized view) và Push queries (continuous streaming). Use case: real-time dashboard, filtering, joining streams cho business analytics. Kafka Streams tốt hơn cho: complex stateful logic, unit testing, CI/CD pipeline, embedding trong microservice.

#28Event sourcing với Kafka: pattern hoạt động như thế nào và lợi ích/nhược điểm?

Nâng Cao

Event sourcing lưu trữ mọi thay đổi state của domain object dưới dạng immutable event (append-only log) thay vì chỉ lưu state hiện tại.

Kafka là storage backend lý tưởng: durable, ordered, replayable.

Pattern hoạt động:

Mọi action (OrderPlaced, PaymentProcessed, OrderShipped) được publish như event vào Kafka topic;
Current state được derive bằng cách replay tất cả event từ đầu hoặc từ snapshot gần nhất;
CQRS thường đi kèm: Command side (write) publish event → Kafka → Event handler rebuild read model cho Query side. Lợi ích: audit trail đầy đủ, time-travel debugging, eventual consistency tự nhiên, dễ scale read side. Nhược điểm: eventual consistency phức tạp hơn strong consistency; query current state cần rebuild từ events (giải quyết bằng snapshot và read model); schema evolution khó (events immutable nên không thể sửa); storage lớn theo thời gian

Dùng log compaction để snapshot state, giảm replay time.

#29Transactional Outbox Pattern là gì? Tại sao cần thiết khi dùng Kafka với database?

Nâng Cao

Vấn đề: khi một service cần cập nhật DB và publish event lên Kafka trong cùng một operation, nếu không có coordination: DB commit thành công nhưng Kafka publish fail → inconsistency (event mất); hoặc Kafka publish thành công nhưng DB rollback → inconsistency (event giả).

Hai thao tác này không thể wrap trong 2-phase commit thực tế. Outbox pattern giải quyết bằng cách:

Trong cùng DB transaction, write business data VÀ write event vào bảng outbox trong DB;
Một separate process (Outbox Processor/CDC) đọc bảng outbox và publish lên Kafka;
Sau khi publish thành công, đánh dấu event là processed (hoặc xóa)

Implement bằng Debezium CDC (capture thay đổi outbox table trực tiếp từ DB transaction log) → zero-polling-delay.

Nhược điểm: thêm độ phức tạp, publish có thể bị delay nhỏ.

Đây là pattern bắt buộc trong distributed system production-grade.

#30Kafka vs RabbitMQ: so sánh chi tiết trade-offs để chọn đúng cho từng use case?

Nâng Cao

Kafka phù hợp cho event streaming với replay và throughput cao; RabbitMQ phù hợp cho task queue với routing phức tạp và TTL ngắn.

Kafka:
- Pull model: consumer tự pull theo tốc độ riêng
- Message retention: lưu trên disk theo retention policy (ngày/GB), có thể replay
- Throughput: hàng triệu msg/s, horizontal scale via partition
- Ordering: đảm bảo trong partition
- Routing: đơn giản (topic/partition), không có flexible routing
- Protocol: Kafka binary protocol
- Use case: event streaming, CDC, log aggregation, data pipeline, event sourcing

RabbitMQ:
- Push model: broker push đến consumer
- Message: xóa sau khi consumer ack
- Throughput: cao với quorum queues (500k+ msg/s trong equivalent workloads), scale phức tạp hơn Kafka
- Ordering: best-effort (không đảm bảo với multiple consumers)
- Routing: flexible (fanout/direct/topic/headers exchange)
- Protocol: AMQP, STOMP, MQTT
- Use case: task queue, RPC, microservices command, priority queue, complex routing

Chọn Kafka khi: cần replay, audit trail, multiple independent consumers, throughput > 100k/s, event-driven architecture.
Chọn RabbitMQ khi: cần complex routing, message priority, request-reply, team quen với message broker truyền thống.

#31CQRS pattern với Kafka: Command side và Query side tách biệt như thế nào?

Nâng Cao

CQRS (Command Query Responsibility Segregation) tách biệt model cho write (Command) và read (Query).

Kafka là event bus lý tưởng để kết nối hai side:

Command side: nhận command (CreateOrder), validate, update write model (DB), publish event (OrderCreated) lên Kafka;
Event consumer: subscribe Kafka topic, xử lý event để update read model (search index, cache, materialized view phù hợp cho query);
Query side: đọc từ read model đã được optimize (Elasticsearch cho full-text search, Redis cho low-latency lookup, PostgreSQL view cho reports)

Lợi ích: write và read model có thể scale độc lập; read model optimize cho access pattern cụ thể; dễ add thêm read model mới mà không ảnh hưởng write side. Nhược điểm: eventual consistency — sau khi write, read model chưa update ngay (thường < 100ms nhưng phải communicate với user).

Cần careful UX: sau khi user tạo order, show optimistic UI thay vì đọc lại từ DB ngay.

#1Apache Kafka là gì và tại sao nó được sử dụng phổ biến trong hệ thống phân tán?

Cơ Bản

Apache Kafka được LinkedIn phát triển và open-source năm 2011.
Kafka được thiết kế để xử lý luồng dữ liệu thời gian thực với thông lượng cực cao, độ trễ thấp và khả năng mở rộng theo chiều ngang.
Điểm mạnh của Kafka so với các message broker truyền thống là khả năng lưu trữ message lâu dài trên disk (không xóa sau khi consumer đọc), cho phép nhiều consumer đọc lại cùng một message.
Kafka thường được dùng trong các use case như event sourcing, log aggregation, real-time analytics, data pipeline giữa các microservices, và change data capture (CDC) từ database.

#2Giải thích kiến trúc Kafka: Broker, Topic, Partition, Consumer Group hoạt động như thế nào?

Cơ Bản

Ví dụ: nếu topic có 6 partition và group có 3 consumer, mỗi consumer xử lý 2 partition.

#3Producer và Consumer trong Kafka hoạt động như thế nào? Các cấu hình quan trọng cần biết?

Cơ Bản

Producer gửi message đến một topic và Kafka tự động phân phối message vào các partition (theo key hash, round-robin, hoặc custom partitioner).

Cấu hình quan trọng của producer: acks (0=fire-and-forget, 1=leader ack, all=tất cả ISR ack), retries, batch.size và linger.ms để tối ưu throughput.
Consumer đọc message từ partition theo offset và có thể commit offset tự động (enable.auto.commit=true) hoặc thủ công.
Cấu hình quan trọng của consumer: auto.offset.reset (earliest/latest), max.poll.records, session.timeout.ms.
Trong thực tế, nên dùng manual commit để tránh mất message khi consumer crash trước khi xử lý xong.

#4Kafka Connect là gì? Nó giải quyết bài toán gì trong data pipeline?

Cơ Bản

Kafka Connect có hai loại connector: Source Connector (đọc data từ external system vào Kafka, ví dụ: Debezium CDC từ PostgreSQL) và Sink Connector (ghi data từ Kafka ra external system, ví dụ: Elasticsearch Sink).
Connect chạy ở chế độ distributed với worker pool, tự động handle fault tolerance và load balancing.

#5Các use case phổ biến nhất của Kafka trong thực tế là gì?

Cơ Bản

Kafka được dùng rộng rãi cho log aggregation, event sourcing, CDC, và microservices communication nhờ khả năng lưu trữ và replay message.

Các use case phổ biến của Kafka trong production:

Log aggregation: tập hợp log từ nhiều service vào một nơi, sau đó forward sang Elasticsearch hoặc S3 để phân tích.
Event sourcing: lưu tất cả sự kiện (thay vì chỉ state hiện tại) để rebuild state bất kỳ lúc nào.
Real-time analytics: stream data từ user activity vào Kafka, xử lý với Kafka Streams hoặc Flink để cập nhật dashboard real-time.
Microservices communication: thay vì gọi API trực tiếp (tight coupling), các service publish event và service khác subscribe — giảm coupling và tăng resilience.
Change Data Capture (CDC): dùng Debezium để capture mọi thay đổi từ database, sync sang data warehouse hoặc invalidate cache.
Fraud detection: stream giao dịch tài chính qua Kafka, dùng ML model để detect bất thường trong real-time.

#6Offset management trong Kafka là gì? Phân biệt auto commit và manual commit?

Trung Bình

Offset là số thứ tự của message trong một partition, bắt đầu từ 0.

Kafka lưu offset của consumer vào một internal topic tên __consumer_offsets.
Auto commit (enable.auto.commit=true) sẽ tự động commit offset theo chu kỳ auto.commit.interval.ms (mặc định 5000ms), nhưng có thể dẫn đến mất message nếu consumer crash SAU khi auto-commit nhưng TRƯỚC khi xử lý xong message đó, hoặc xử lý trùng nếu consumer crash sau khi xử lý nhưng trước khi auto commit.
Manual commit (commitSync() hoặc commitAsync()) cho phép kiểm soát chính xác: chỉ commit sau khi xử lý thành công. commitSync() block cho đến khi commit thành công, commitAsync() không block nhưng cần callback để xử lý lỗi.
Trong production, nên dùng manual commit kết hợp với idempotent processing để đảm bảo at-least-once hoặc exactly-once semantics.

#7Message ordering trong Kafka được đảm bảo như thế nào? Khi nào ordering bị phá vỡ?

Trung Bình

Kafka đảm bảo thứ tự message trong phạm vi một partition — các message được ghi và đọc theo đúng thứ tự FIFO.

Tuy nhiên, không có đảm bảo thứ tự giữa các partition khác nhau.
Để đảm bảo ordering cho một nhóm message liên quan (ví dụ: tất cả sự kiện của một user), cần gán cùng một message key — Kafka sẽ hash key và luôn route message có cùng key vào cùng partition.
Ordering có thể bị phá vỡ khi: producer bật retries > 0 và max.in.flight.requests.per.connection > 1 (message sau có thể arrive trước message trước bị retry) — giải pháp là bật enable.idempotence=true.
Ngoài ra, rebalancing consumer group không ảnh hưởng đến thứ tự trong partition vì offset được duy trì.

#8Replication và ISR (In-Sync Replicas) trong Kafka là gì? Cách Kafka đảm bảo fault tolerance?

Trung Bình

Mỗi partition có một leader và nhiều follower replica trên các broker khác nhau.

Leader xử lý tất cả read/write, follower chủ động pull data từ leader để sync.
ISR (In-Sync Replicas) là tập hợp các replica đang sync kịp với leader (không bị lag quá replica.lag.time.max.ms).
Khi leader broker bị lỗi, một replica trong ISR sẽ được bầu làm leader mới.
Cấu hình replication.factor=3 và min.insync.replicas=2 kết hợp với acks=all đảm bảo message chỉ được acknowledge khi ít nhất 2 replica đã ghi — bảo vệ khỏi mất data kể cả khi 1 broker bị lỗi.
Trong production nên dùng replication.factor >= 3 để chịu được lỗi của 2 broker đồng thời.

#9Kafka Streams là gì? Khác gì so với việc consume message thông thường?

Trung Bình

Ví dụ: tính tổng doanh thu mỗi 5 phút từ stream order events, join stream clicks với stream purchases để tính conversion rate.

#10Idempotent producer trong Kafka là gì? Cách bật và hoạt động như thế nào?

Trung Bình

Ví dụ:

props.put("enable.idempotence", "true");
props.put("transactional.id", "my-tx-id"); // cho transactions

#11Producer batching trong Kafka: linger.ms và batch.size hoạt động thế nào để tối ưu throughput?

Trung Bình

Trade-off: linger.ms cao → throughput tốt, latency cao; linger.ms=0 → latency thấp nhất, throughput thấp hơn.

#12At-least-once vs at-most-once vs exactly-once trong Kafka: phân biệt và cách đạt được mỗi loại?

Trung Bình

At-most-once (fire-and-forget): producer acks=0, consumer commit offset trước khi xử lý — message có thể mất, không bao giờ duplicate.

Dùng cho metrics/log không cần chính xác tuyệt đối. At-least-once: producer acks=all + retries, consumer commit sau khi xử lý thành công — không mất message nhưng có thể xử lý trùng (duplicate) khi consumer crash sau xử lý nhưng trước commit.
Yêu cầu consumer logic phải idempotent. Exactly-once: khó nhất, cần cả producer và consumer phối hợp.
Producer: bật enable.idempotence=true + Kafka Transactions (transactional.id).
Consumer: isolation.level=read_committed + đặt offset commit trong cùng transaction với business logic (nếu write ra DB phải là transactional outbox).
Kafka Streams với processing.guarantee=exactly_once_v2 tự handle toàn bộ.
Exactly-once có overhead ~20-30% latency — chỉ dùng cho billing, financial transactions.

#13Consumer group và partition assignment: cách Kafka phân phối partition cho consumers?

Trung Bình

Kafka Group Coordinator (broker) quản lý consumer group lifecycle và partition assignment.

Khi consumer join group, Group Coordinator trigger rebalance và chọn một consumer làm Group Leader — Leader thực hiện partition assignment theo strategy đã cấu hình và gửi kết quả về Coordinator. RangeAssignor (default): assign partition liên tiếp theo topic — consumer 0 nhận partition 0,1; consumer 1 nhận 2,3.
Nếu partition không chia đều, consumer đầu tiên nhận nhiều hơn (uneven distribution across topics). RoundRobinAssignor: phân phối đều hơn bằng cách xen kẽ — mỗi consumer nhận partition luân phiên. StickyAssignor: cố gắng minimize sự thay đổi so với assignment trước. CooperativeStickyAssignor: như Sticky nhưng dùng cooperative protocol (không stop-the-world) — khuyến nghị cho ứng dụng mới theo KIP-429.
Thực tế: số consumer không nên vượt số partition — consumer thừa sẽ idle.
Khi scale out, thêm partition TRƯỚC khi thêm consumer để tránh idle consumers.

#14Partition key design trong Kafka: cách chọn key để tránh hotspot và đảm bảo ordering?

Trung Bình

Partition key quyết định message vào partition nào (hash(key) % numPartitions).

Chọn key sai dẫn đến: hotspot (một partition bị overload), hoặc mất ordering (message liên quan vào partition khác nhau). Nguyên tắc chọn key: chọn field có cardinality cao và phân phối đều — user_id, order_id, device_id tốt hơn country_code hay status.
Key phải là field mà ordering quan trọng trong context của nó: tất cả event của cùng order_id phải theo thứ tự → dùng order_id làm key. Tránh hotspot: nếu key phân phối không đều (một số user_id hoạt động cực nhiều), thêm random suffix: user_id + '-' + random(0, N) — nhưng sẽ mất ordering global per user.
Kỹ thuật khác: key salting key + timestamp_bucket — partition thay đổi theo thời gian.
Tăng số partition cũng giúp giảm hotspot nhưng không giải quyết root cause nếu key quá skewed.

#15Exactly-once semantics (EOS) trong Kafka hoạt động như thế nào? Giải thích idempotent producers và transactions.

Nâng Cao

Lưu ý: EOS có overhead về latency và throughput (~20-30%), chỉ dùng khi thực sự cần thiết (tài chính, billing).

#16Các chiến lược partition trong Kafka: khi nào dùng key-based, round-robin, hay custom partitioner?

Nâng Cao

Round-robin (khi message key là null): phân phối đều message sang các partition, tối ưu throughput nhưng không đảm bảo ordering.

Key-based partitioning (hash của key % numPartitions): đảm bảo tất cả message cùng key vào một partition — bắt buộc khi cần ordering (ví dụ: tất cả event của user_id=123 phải theo thứ tự).
Vấn đề hotspot: nếu key phân phối không đều (ví dụ: một số user_id hoạt động cực kỳ nhiều), một số partition sẽ overloaded — giải pháp là thêm random suffix vào key hoặc dùng custom partitioner với logic phức tạp hơn.
Custom partitioner cho phép route theo business logic (ví dụ: VIP customer vào partition riêng để ưu tiên xử lý).
Khi tăng số partition của existing topic, message key cũ có thể bị route sang partition khác — cần planning kỹ trước khi deploy.

#17Consumer rebalancing trong Kafka: vấn đề gì có thể xảy ra và cách tối ưu?

Nâng Cao

Rebalancing xảy ra khi consumer group thay đổi (thêm/xóa consumer, consumer crash, hoặc subscription thay đổi).

Trong quá trình rebalance, toàn bộ group dừng consume (stop-the-world), có thể gây latency spike.
Vấn đề phổ biến: session.timeout.ms quá ngắn → consumer bị kick ra group vì GC pause dài; max.poll.interval.ms quá ngắn → consumer xử lý lâu bị coi là dead.
Tối ưu: tăng session.timeout.ms và heartbeat.interval.ms, giảm max.poll.records, dùng incremental cooperative rebalancing (partition.assignment.strategy=CooperativeStickyAssignor) thay vì eager rebalancing — chỉ revoke partition thực sự cần chuyển, không dừng toàn bộ group.
Static group membership (group.instance.id) giúp tránh rebalance khi restart consumer (reuse partition assignment cũ trong vòng session.timeout.ms).

#18Dead Letter Queue (DLQ) trong Kafka là gì? Cách implement xử lý message lỗi?

Nâng Cao

Kafka có Spring Kafka @RetryableTopic và @DltHandler để implement pattern này tự động với exponential backoff.
Cần cân nhắc: nếu dừng xử lý để retry, các message sau sẽ bị delay (ordering preserved nhưng throughput giảm); nếu skip và gửi DLQ, ordering bị phá vỡ nhưng throughput không bị ảnh hưởng.
Monitoring DLQ là critical — nên alert khi DLQ có message, có team xử lý manual hoặc replay sau khi fix bug.
Một best practice khác là dùng retry topic với tên topic.RETRY-1, topic.RETRY-2 với delay tăng dần.

#19Monitoring và performance tuning Kafka: các metrics quan trọng và cách tối ưu throughput/latency?

Nâng Cao

Tối ưu throughput producer: tăng batch.size (16KB→128KB), thêm linger.ms (0→20ms), bật compression.type=lz4 giảm network I/O.
Tối ưu throughput consumer: tăng fetch.min.bytes và fetch.max.wait.ms để fetch theo batch lớn, tăng max.poll.records.
Tối ưu broker: tăng số thread I/O (num.io.threads), dùng dedicated disk cho Kafka log (tránh share với OS), đặt log.dirs trên multiple disk để parallel I/O.
Dùng Kafka Exporter + Prometheus + Grafana cho observability stack.

#20Kafka trong kiến trúc microservices: event-driven architecture, saga pattern, và các pitfall cần tránh?

Nâng Cao

Saga pattern dùng Kafka để coordinate distributed transaction: choreography-based saga — mỗi service lắng nghe event và publish event tiếp theo, không có central coordinator; orchestration-based saga — Saga Orchestrator gửi command đến từng service qua Kafka và lắng nghe response.

#21Schema Registry trong Kafka ecosystem là gì? Tại sao cần dùng Avro/Protobuf thay vì JSON?

Nâng Cao

Schema Registry (Confluent) là service lưu trữ và quản lý schema của Kafka message, đảm bảo producer và consumer đồng thuận về format dữ liệu.

JSON không có schema enforcement — producer có thể thay đổi field name/type mà không báo, khiến consumer bị lỗi.
Avro và Protobuf là binary serialization formats: nhỏ hơn JSON ~3-10x (giảm storage và network cost), có schema evolution với compatibility rules (BACKWARD, FORWARD, FULL).
Schema Registry lưu schema theo subject (mặc định là topic name), assign schema ID; message chỉ chứa schema ID (4 bytes) thay vì full schema, consumer lookup schema từ registry khi cần.
Khi thêm field mới với default value (BACKWARD compatible), consumer cũ vẫn deserialize được; xóa field hoặc đổi type là BREAKING change bị Schema Registry reject nếu cấu hình không cho phép.
Đây là best practice bắt buộc trong production Kafka cluster.

#22KRaft mode trong Kafka là gì? Tại sao Kafka loại bỏ ZooKeeper?

Nâng Cao

#23Kafka consumer lag là gì? Cách monitor và alert khi lag quá cao?

Nâng Cao

Consumer lag = (Log End Offset của partition) - (Current Offset của consumer group) = số message consumer chưa xử lý.

Lag cao đồng nghĩa consumer đang xử lý chậm hơn producer ghi vào, có thể dẫn đến message bị expire (nếu log.retention.hours ngắn) hoặc data trễ quá mức chấp nhận.
Monitor lag:

bash

# Kafka CLI
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group my-group

# Kết quả: TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG

Prometheus stack: Kafka Exporter (kafka_consumergroup_lag) + Grafana dashboard.
Alert khi: lag vượt threshold (ví dụ > 10,000 records), lag tăng liên tục qua nhiều scrape interval.
Nguyên nhân lag cao: consumer xử lý chậm (heavy business logic, DB call), GC pause, rebalancing, downstream bottleneck.
Giải pháp: tăng số consumer (nếu còn partition chưa được assign), optimize processing logic, tăng max.poll.records, cân nhắc async processing.

#24Log compaction trong Kafka là gì? Tombstone record dùng để làm gì?

Nâng Cao

Log compaction là cơ chế Kafka chỉ giữ lại message mới nhất cho mỗi message key, xóa các phiên bản cũ — thay vì xóa theo thời gian (log.retention.hours).

Cấu hình: cleanup.policy=compact. Ứng dụng: KTable trong Kafka Streams (compacted topic lưu state mới nhất), changelog topic của Kafka Connect, event sourcing snapshot (chỉ cần state cuối). Tombstone record: message có key nhưng value=null — báo hiệu cho log compaction xóa tất cả message có key đó khỏi compacted log.
Dùng để xóa data khỏi KTable hoặc downstream systems (GDPR right-to-erasure).
Log compaction không chạy real-time: log.cleaner.min.cleanable.ratio và log.cleaner.backoff.ms control tần suất.
Active segment (segment đang ghi) không bao giờ bị compact — chỉ inactive segments.
Consumer đọc compacted topic có thể thấy cả message cũ và mới trong một scrub cycle.

#25min.insync.replicas và acks=all kết hợp bảo vệ data như thế nào? Khi nào producer throw NotEnoughReplicasException?

Nâng Cao

Đây là trade-off giữa availability và durability.
Nếu muốn availability hơn durability: giảm min.insync.replicas=1 (có thể mất data nếu leader fail trước follower sync).
Trong production tài chính: replication.factor=3, min.insync.replicas=2, acks=all là cấu hình bắt buộc.

#26MirrorMaker 2 là gì? Cách thực hiện cross-cluster replication trong Kafka?

Nâng Cao

MirrorMaker 2 (MM2) là công cụ replication chính thức của Kafka để sao chép data giữa các Kafka cluster (active-passive DR, active-active multi-region, migration).

MM2 xây trên Kafka Connect framework: dùng MirrorSourceConnector (replicate messages), MirrorCheckpointConnector (replicate consumer offsets), MirrorHeartbeatConnector (monitor replication lag).
Topic được replicate với prefix: topic orders trên cluster A trở thành clusterA.orders trên cluster B — tránh naming conflict.
Offset translation: MM2 translate consumer offsets giữa cluster để consumer có thể failover sang cluster B và tiếp tục từ đúng vị trí.
Config cơ bản:

properties

clusters = source, target
source.bootstrap.servers = source:9092
target.bootstrap.servers = target:9092
source->target.enabled = true
source->target.topics = .*

Alternative: Confluent Replicator (commercial), hoặc dùng Kafka application-level replication (consumer từ source, produce sang target).

#27ksqlDB là gì? Khác gì so với Kafka Streams?

Nâng Cao

#28Event sourcing với Kafka: pattern hoạt động như thế nào và lợi ích/nhược điểm?

Nâng Cao

Event sourcing lưu trữ mọi thay đổi state của domain object dưới dạng immutable event (append-only log) thay vì chỉ lưu state hiện tại.

Kafka là storage backend lý tưởng: durable, ordered, replayable.

Pattern hoạt động:

Mọi action (OrderPlaced, PaymentProcessed, OrderShipped) được publish như event vào Kafka topic;
Current state được derive bằng cách replay tất cả event từ đầu hoặc từ snapshot gần nhất;
CQRS thường đi kèm: Command side (write) publish event → Kafka → Event handler rebuild read model cho Query side. Lợi ích: audit trail đầy đủ, time-travel debugging, eventual consistency tự nhiên, dễ scale read side. Nhược điểm: eventual consistency phức tạp hơn strong consistency; query current state cần rebuild từ events (giải quyết bằng snapshot và read model); schema evolution khó (events immutable nên không thể sửa); storage lớn theo thời gian

Dùng log compaction để snapshot state, giảm replay time.

#29Transactional Outbox Pattern là gì? Tại sao cần thiết khi dùng Kafka với database?

Nâng Cao

Hai thao tác này không thể wrap trong 2-phase commit thực tế. Outbox pattern giải quyết bằng cách:

Trong cùng DB transaction, write business data VÀ write event vào bảng outbox trong DB;
Một separate process (Outbox Processor/CDC) đọc bảng outbox và publish lên Kafka;
Sau khi publish thành công, đánh dấu event là processed (hoặc xóa)

Implement bằng Debezium CDC (capture thay đổi outbox table trực tiếp từ DB transaction log) → zero-polling-delay.

Nhược điểm: thêm độ phức tạp, publish có thể bị delay nhỏ.

Đây là pattern bắt buộc trong distributed system production-grade.

#30Kafka vs RabbitMQ: so sánh chi tiết trade-offs để chọn đúng cho từng use case?

Nâng Cao

Kafka phù hợp cho event streaming với replay và throughput cao; RabbitMQ phù hợp cho task queue với routing phức tạp và TTL ngắn.

#31CQRS pattern với Kafka: Command side và Query side tách biệt như thế nào?

Nâng Cao

CQRS (Command Query Responsibility Segregation) tách biệt model cho write (Command) và read (Query).

Kafka là event bus lý tưởng để kết nối hai side:

Command side: nhận command (CreateOrder), validate, update write model (DB), publish event (OrderCreated) lên Kafka;
Event consumer: subscribe Kafka topic, xử lý event để update read model (search index, cache, materialized view phù hợp cho query);
Query side: đọc từ read model đã được optimize (Elasticsearch cho full-text search, Redis cho low-latency lookup, PostgreSQL view cho reports)

Cần careful UX: sau khi user tạo order, show optimistic UI thay vì đọc lại từ DB ngay.

Luyện Phỏng Vấn IT — 2000+ Câu Hỏi Phỏng Vấn IT Có Đáp Án 2026

Kafka

Luyện Phỏng Vấn IT — 2000+ Câu Hỏi Phỏng Vấn IT Có Đáp Án 2026

Kafka