44 câu hỏi phỏng vấn System Design có đáp án

#1CAP Theorem là gì và tại sao nó quan trọng trong thiết kế hệ thống phân tán? (What is the CAP Theorem and why does it matter in distributed system design?)

Cơ Bản

CAP Theorem phát biểu rằng một hệ thống phân tán chỉ có thể đảm bảo đồng thời tối đa 2 trong 3 thuộc tính: Consistency (tính nhất quán – mọi node trả về dữ liệu mới nhất), Availability (tính sẵn sàng – mọi request đều nhận được response), và Partition Tolerance (chịu lỗi phân vùng – hệ thống tiếp tục hoạt động dù mạng bị chia cắt).

Trong thực tế, Partition Tolerance luôn cần thiết vì lỗi mạng là không thể tránh khỏi, nên lựa chọn thực sự là giữa CP (chọn consistency, hi sinh availability) và AP (chọn availability, hi sinh consistency).

Ví dụ: HBase, Zookeeper là CP; Cassandra, DynamoDB là AP; RDBMS truyền thống là CA (chỉ dùng trong môi trường không phân tán). Khi thiết kế, cần xác định nghiệp vụ ưu tiên gì: hệ thống ngân hàng cần CP, còn mạng xã hội có thể chấp nhận AP với eventual consistency.

#2ACID và BASE khác nhau như thế nào? Khi nào dùng mỗi mô hình? (How do ACID and BASE differ, and when should each model be used?)

Cơ Bản

ACID (Atomicity, Consistency, Isolation, Durability) là tập hợp thuộc tính đảm bảo transaction trong RDBMS luôn đáng tin cậy: toàn bộ transaction thành công hoặc rollback hoàn toàn, dữ liệu luôn hợp lệ, các transaction cô lập nhau, và dữ liệu đã commit không bao giờ mất.

BASE (Basically Available, Soft state, Eventually consistent) là mô hình của hệ thống NoSQL phân tán: hệ thống cơ bản luôn available, trạng thái có thể thay đổi theo thời gian, và cuối cùng sẽ đạt trạng thái nhất quán.

ACID phù hợp cho giao dịch tài chính, đặt hàng, bất kỳ nghiệp vụ nào yêu cầu tính chính xác tuyệt đối. BASE phù hợp cho hệ thống cần scale lớn như mạng xã hội, analytics, giỏ hàng e-commerce – nơi đôi khi đọc dữ liệu hơi cũ vẫn chấp nhận được để đổi lấy hiệu năng và khả năng mở rộng.

#3Vertical Scaling và Horizontal Scaling là gì? Ưu nhược điểm của từng loại? (What are Vertical and Horizontal Scaling? What are the pros and cons of each?)

Cơ Bản

Vertical Scaling (scale up) là nâng cấp phần cứng của một máy chủ duy nhất: tăng CPU, RAM, SSD – đơn giản, không cần thay đổi code, nhưng bị giới hạn bởi phần cứng tối đa và tạo ra single point of failure.

Horizontal Scaling (scale out) là thêm nhiều máy chủ vào hệ thống, phân phối tải qua load balancer – không giới hạn lý thuyết, fault-tolerant hơn, nhưng phức tạp hơn vì cần xử lý distributed state, session management, và data consistency.

Vertical scaling phù hợp khi muốn giải pháp nhanh cho hệ thống nhỏ/trung bình, hoặc cho database (dễ scale hơn application). Horizontal scaling là lựa chọn dài hạn cho hệ thống lớn như Netflix, Google – stateless services dễ scale ngang, trong khi database cần sharding hoặc replica để scale ngang hiệu quả.

#4Load Balancing hoạt động như thế nào? Các thuật toán load balancing phổ biến là gì? (How does Load Balancing work? What are the common load balancing algorithms?)

Cơ Bản

Load Balancer là thành phần đứng giữa client và các server, phân phối incoming requests để không có server nào bị quá tải, đồng thời tăng availability bằng cách redirect traffic khi có server bị lỗi.

Các thuật toán phổ biến: Round Robin (luân phiên tuần tự – đơn giản nhưng không quan tâm trọng tải thực tế), Weighted Round Robin (gán trọng số theo capacity), Least Connections (gửi đến server ít connection nhất – tốt khi request có thời gian xử lý khác nhau), IP Hash (hash IP client để cùng client luôn đến cùng server – hữu ích cho session stickiness), Least Response Time (chọn server nhanh nhất).
Layer 4 LB hoạt động ở transport layer (TCP/UDP), nhanh hơn nhưng ít thông minh hơn; Layer 7 LB hoạt động ở application layer, có thể route dựa trên URL, header, cookie – linh hoạt hơn nhưng overhead cao hơn.
AWS ALB, Nginx, HAProxy là các giải pháp phổ biến.

#5CDN là gì và nó cải thiện hiệu năng hệ thống như thế nào? (What is a CDN and how does it improve system performance?)

Cơ Bản

CDN (Content Delivery Network) là mạng lưới các server phân tán địa lý, lưu trữ bản sao (cache) của static content (hình ảnh, JS, CSS, video) tại các edge nodes gần người dùng nhất.

Khi user request một file, CDN route request đến edge node gần nhất thay vì origin server, giảm latency đáng kể – ví dụ user ở Việt Nam truy cập CDN node Singapore thay vì origin server ở US.
Ngoài latency, CDN còn giảm tải cho origin server, tăng throughput toàn cầu, có built-in DDoS protection, và cải thiện availability (cache vẫn serve được khi origin tạm thời down).
CDN phù hợp nhất cho static assets và video streaming; dynamic content cũng có thể cache nếu dùng Edge Computing (Cloudflare Workers, Vercel Edge).
Các CDN lớn: Cloudflare, AWS CloudFront, Akamai, Fastly.

#6Forward Proxy và Reverse Proxy khác nhau như thế nào? Mỗi loại dùng trong trường hợp nào? (How do Forward Proxy and Reverse Proxy differ? When is each used?)

Cơ Bản

Forward Proxy đứng phía trước client, đại diện cho client gửi request ra ngoài – client biết về proxy, nhưng server ngoài không biết request thực sự đến từ đâu.

Dùng để: bypass geo-restriction, ẩn IP client, filter nội dung trong corporate network, caching outbound requests (giảm bandwidth).
Reverse Proxy đứng phía trước server, đại diện cho server nhận request từ client – client nghĩ mình đang nói chuyện với server thực, không biết có proxy ở giữa.
Dùng để: load balancing, SSL termination, caching, rate limiting, authentication, ẩn cấu trúc internal network.
Nginx và HAProxy thường đóng vai trò reverse proxy trong production; Squid là forward proxy phổ biến.
API Gateway về bản chất là một reverse proxy chuyên biệt với thêm tính năng như auth, routing, transformation.

#7Latency và Throughput là gì? Tại sao chúng thường có trade-off với nhau? (What are Latency and Throughput? Why do they often trade off against each other?)

Cơ Bản

Latency là thời gian để hoàn thành một request đơn lẻ (đo bằng ms) – thấp là tốt, thể hiện độ nhanh nhạy của hệ thống. Throughput là số lượng requests/operations hệ thống xử lý được trong một đơn vị thời gian (requests/second, transactions/second) – cao là tốt, thể hiện năng lực của hệ thống.

Trade-off xảy ra khi: batching tăng throughput nhưng tăng latency; caching tăng throughput nhưng có thể tăng latency cho cache miss; thêm queue buffer tăng throughput nhưng tăng latency. Trong thực tế, cần xác định SLA: hệ thống real-time gaming cần latency thấp dưới 50ms; hệ thống ETL batch cần throughput cao hơn. Benchmarking nên đo P50, P95, P99 latency để hiểu tail latency, không chỉ average.

#8Các mô hình Consistency trong hệ thống phân tán là gì? (What are the consistency models in distributed systems?)

Cơ Bản

Các mô hình consistency phổ biến:

Strong Consistency (Linearizability): sau khi write thành công, mọi read sau đó đều thấy giá trị mới nhất – dễ lập trình nhất nhưng latency cao nhất vì cần coordination giữa các node.
Eventual Consistency: dữ liệu sẽ nhất quán sau một khoảng thời gian, nhưng tạm thời có thể đọc dữ liệu cũ – được dùng trong Cassandra, DynamoDB, DNS.
Read-your-writes Consistency: sau khi user write, chính user đó luôn đọc được dữ liệu mới nhất (dù user khác chưa thấy) – quan trọng cho UX tốt trong social media.
Causal Consistency: các operation có quan hệ nhân quả được nhìn thấy theo đúng thứ tự (A post → B comment → mọi người thấy comment sau post).
Monotonic Read: user không bao giờ thấy dữ liệu quay ngược thời gian (không đọc v2 rồi đọc v1).

Lựa chọn consistency model ảnh hưởng trực tiếp đến latency, availability và trải nghiệm người dùng.

#9Caching strategies cho frontend app?

Trung Bình

Có nhiều tầng caching cho frontend application, mỗi tầng phục vụ mục đích khác nhau. Tầng trình duyệt dùng HTTP headers như Cache-Control và ETag để cache tài nguyên tĩnh (JS, CSS, images), giúp giảm network requests khi user quay lại trang.

Tầng ứng dụng dùng React Query hoặc SWR với chiến lược stale-while-revalidate để cache dữ liệu API — hiển thị data cũ ngay lập tức rồi cập nhật phía sau, mang lại trải nghiệm nhanh cho user. Service Worker cho phép xây dựng ứng dụng offline-first bằng cách cache resources trong Cache API.

Ngoài ra còn localStorage cho user preferences, SessionStorage cho form drafts tạm thời, IndexedDB cho tập dữ liệu lớn cần truy vấn, và CDN cho static assets giúp giảm latency theo vùng địa lý.

#10Database Sharding là gì? Các chiến lược sharding phổ biến và khi nào nên dùng? (What is Database Sharding? What are common sharding strategies and when should you use it?)

Trung Bình

Sharding là kỹ thuật chia dữ liệu của một database thành nhiều phần nhỏ hơn (shards), mỗi shard nằm trên một database server riêng, cho phép scale ngang khi dữ liệu vượt quá capacity của một server.

Các chiến lược:
- Range-based sharding: chia theo range của key (user_id 1-1M trên shard 1) – dễ implement nhưng dễ tạo hot spot.
- Hash-based sharding: hash key để phân phối đều – tránh hot spot nhưng khó range query.
- Directory-based sharding: lookup table ánh xạ key → shard – linh hoạt nhất nhưng thêm lookup overhead.
- Geographic sharding: chia theo region – tốt cho compliance và latency.

Thách thức: cross-shard joins tốn kém, distributed transactions phức tạp, rebalancing khi thêm shard khó. Dùng sharding khi đã tối ưu hết cách khác (index, caching, read replicas) và dataset thực sự vượt quá TB.

#11Read Replica là gì và nó giúp scale database như thế nào? Có những hạn chế nào? (What are Read Replicas and how do they help scale a database? What are the limitations?)

Trung Bình

Read Replica là bản sao của primary database, chỉ nhận read queries trong khi primary (master) nhận tất cả write queries – asynchronous replication đồng bộ dữ liệu từ primary sang replica.

Lợi ích: giảm read load trên primary (80-90% workload thường là read), cho phép scale read throughput bằng cách thêm replicas, dùng replica cho analytics/reporting mà không ảnh hưởng production. AWS RDS, PostgreSQL, MySQL đều hỗ trợ read replicas dễ dàng. Hạn chế quan trọng: replication lag – replica có thể chậm hơn primary vài ms đến vài giây, nên có thể đọc stale data (eventual consistency); cần application-level logic để route read vs write queries; failover tự động cần configuration thêm; write vẫn là bottleneck vì chỉ có một primary. Giải pháp cho write bottleneck là sharding hoặc multi-master replication (phức tạp hơn vì conflict resolution).

#12Cache-aside, Write-through, và Write-back caching khác nhau như thế nào? (How do Cache-aside, Write-through, and Write-back caching strategies differ?)

Trung Bình

Cache-aside (Lazy Loading): application tự quản lý cache – trước tiên check cache, nếu miss thì đọc từ DB và populate cache (cache-aside pattern).

Ưu: chỉ cache dữ liệu thực sự được đọc, cache failure không block read; Nhược: cache miss đầu tiên luôn có extra latency, dữ liệu cache có thể stale nếu DB được update trực tiếp.
Write-through: mỗi write đồng thời cập nhật cả cache và DB trước khi trả response.
Ưu: cache luôn fresh, không bao giờ stale; Nhược: write latency cao hơn, cache dữ liệu ít được đọc lãng phí memory.
Write-back (Write-behind): write chỉ vào cache trước, trả response ngay, sau đó async flush xuống DB.
Ưu: write latency rất thấp, tốt cho write-heavy workloads; Nhược: data loss risk nếu cache crash trước khi flush, phức tạp hơn.
Redis thường dùng cho cả 3 pattern; cache-aside là phổ biến nhất trong thực tế vì đơn giản và phù hợp hầu hết use case.

#13Rate Limiting là gì? Các thuật toán rate limiting phổ biến và cách implement? (What is Rate Limiting? What are common algorithms and how to implement it?)

Trung Bình

Rate Limiting là kỹ thuật kiểm soát tần suất request từ một client/IP/user để bảo vệ hệ thống khỏi abuse, DDoS, và đảm bảo fair usage.

Các thuật toán:
- Token Bucket – bucket chứa tokens, mỗi request tiêu 1 token, tokens được refill theo rate cố định; cho phép burst ngắn.
- Leaky Bucket – requests được xử lý ở rate cố định; smooths out bursts nhưng không cho phép burst.
- Fixed Window Counter – đếm request trong window cố định (mỗi phút); đơn giản nhưng có boundary problem.
- Sliding Window Log – lưu timestamp của mỗi request, chính xác nhất nhưng tốn memory.
- Sliding Window Counter – kết hợp Fixed Window + sliding, cân bằng tốt.

Implementation: Redis với INCR + EXPIRE cho distributed rate limiting; Nginx module; API Gateway built-in (AWS API GW, Kong). Trả về 429 Too Many Requests với Retry-After header khi vượt limit.

#14Connection Pooling là gì và tại sao nó quan trọng cho database performance? (What is Connection Pooling and why is it critical for database performance?)

Trung Bình

Connection Pooling duy trì sẵn một pool các kết nối đã được khởi tạo; application lấy connection từ pool và trả lại sau khi dùng xong — thay vì mở/đóng từng kết nối mới với overhead TCP handshake + authentication (20-100ms/lần).

Lợi ích: giảm latency đáng kể, giới hạn số connection đến DB (PostgreSQL thường giới hạn max_connections ~100-500), tăng throughput tổng thể. Pool size quan trọng: quá nhỏ gây bottleneck, quá lớn gây OOM hoặc quá tải DB. Công thức sizing phổ biến từ PostgreSQL Wiki: pool_size ≈ (core_count * 2) + effective_spindle_count — đây là gợi ý từ phía DB server, không phải từ PgBouncer cụ thể; PgBouncer khuyến nghị pool_size dựa trên max_connections và concurrency thực tế. Công cụ phổ biến: PgBouncer (PostgreSQL), HikariCP (Java), pg-pool (Node.js).

#15CQRS Pattern là gì? Khi nào nên áp dụng và những thách thức gì khi implement? (What is the CQRS Pattern? When should it be applied and what are the challenges?)

Trung Bình

CQRS (Command Query Responsibility Segregation) là pattern tách biệt hoàn toàn model đọc (Query) và model ghi (Command) – thay vì một model dùng cho cả CRUD. Command side xử lý mutations và thường dùng domain model phức tạp; Query side tối ưu cho read và có thể dùng denormalized view riêng.

Lợi ích: Read model có thể scale độc lập với Write model; Query side có thể dùng database khác (Elasticsearch cho search, Redis cho cache); mỗi side được tối ưu riêng. Thường kết hợp với Event Sourcing: mỗi Command tạo ra Events, Events cập nhật Read Model (eventual consistency). Phù hợp khi: read/write ratio không cân bằng, domain logic phức tạp, cần nhiều dạng read views khác nhau. Thách thức: eventual consistency giữa write và read side; operational complexity tăng; debugging khó hơn; không phù hợp cho CRUD đơn giản. Ví dụ thực tế: e-commerce – order placement (Command) và order listing dashboard (Query) dùng store riêng biệt.

#16Event Sourcing là gì? Lợi ích và hạn chế so với traditional state storage? (What is Event Sourcing? Benefits and limitations compared to traditional state storage?)

Trung Bình

Event Sourcing là pattern lưu trữ mọi thay đổi trạng thái của application dưới dạng chuỗi immutable events thay vì chỉ lưu trạng thái hiện tại – giống như transaction log của bank hơn là số dư hiện tại.

Ví dụ: thay vì lưu account.balance = 1000, lưu [Deposited(500), Deposited(700), Withdrew(200)] – số dư là kết quả replay của events.

Lợi ích: complete audit trail miễn phí, có thể rebuild state tại bất kỳ điểm nào trong lịch sử, time-travel debugging, dễ tích hợp với CQRS và projections, event stream là nguồn sự thật duy nhất. Hạn chế: querying current state cần replay (giải quyết bằng snapshots), event schema evolution phức tạp khi domain thay đổi, storage tốn hơn, learning curve cao. Phù hợp cho: banking/fintech (audit trail bắt buộc), booking systems, domain-driven design với complex business rules. EventStoreDB là database chuyên biệt cho Event Sourcing; có thể implement trên PostgreSQL, Kafka cũng được.

#17Các pattern horizontal scaling cho stateful services là gì? Làm thế nào để handle session state? (What are horizontal scaling patterns for stateful services? How to handle session state?)

Trung Bình

Stateful services yêu cầu externalize state; các lựa chọn chính là Sticky Sessions, Externalized Store (Redis), và JWT — mỗi cách có trade-off riêng về complexity và failover.

Sticky Sessions: load balancer route cùng user về cùng server dựa trên cookie — đơn giản nhưng tạo uneven load và failover khó.
Externalized Session Store: lưu session trong Redis/Memcached thay vì in-memory — mọi instance đọc được, dễ scale và failover.
JWT Tokens: encode state vào token, server không lưu state — hoàn toàn stateless nhưng không thể revoke trước khi expire (giải quyết bằng blacklist).
Database-backed Sessions: lưu session trong DB — đơn giản nhưng chậm hơn Redis.

Thực tế: JWT cho authentication + Redis cho session data ngắn hạn là pattern phổ biến nhất.

#18Microservices và Monolith khác nhau như thế nào? Khi nào nên migrate sang Microservices? (How do Microservices and Monolith differ? When should you migrate to Microservices?)

Trung Bình

Monolith là toàn bộ application được deploy như một unit duy nhất – đơn giản để develop, test, deploy ban đầu, không có network overhead giữa components. Microservices chia application thành nhiều services nhỏ, độc lập, mỗi service có database riêng và được deploy độc lập – cho phép scale từng service riêng, team độc lập, polyglot tech stack.

Microservices không phải lúc nào cũng tốt hơn: overhead của distributed systems (network latency, distributed transactions, service discovery, observability) rất lớn. Nên bắt đầu với Monolith hoặc Modular Monolith (monolith với boundaries rõ ràng).

Chỉ migrate sang microservices khi: team lớn (>50 engineers) và bottleneck khi deploy, các module có nhu cầu scale khác nhau rõ rệt, cần polyglot technology. Martin Fowler khuyên: 'Don't start with microservices' – xây dựng monolith tốt trước, sau đó tách dần theo domain boundaries (DDD Bounded Contexts).

#19API Gateway là gì? Vai trò và các tính năng chính của nó trong microservices architecture? (What is an API Gateway? Its role and key features in a microservices architecture?)

Trung Bình

API Gateway là single entry point cho tất cả client requests đến microservices – hoạt động như reverse proxy với nhiều tính năng bổ sung. Vai trò: routing (forward request đến đúng service), authentication/authorization (centralized auth thay vì mỗi service tự verify), rate limiting, SSL termination, request/response transformation, API versioning, caching.

Lợi ích: client chỉ cần biết một endpoint thay vì nhiều service URLs; giảm round trips với request aggregation (BFF pattern – Backend for Frontend); dễ thêm cross-cutting concerns mà không sửa services. Hạn chế: single point of failure nếu không có HA setup; có thể trở thành bottleneck; thêm latency; có thể tạo coupling nếu overloaded với logic. Phân biệt API Gateway vs Service Mesh: API Gateway xử lý North-South traffic (external → internal), Service Mesh xử lý East-West traffic (service → service). Giải pháp phổ biến: AWS API Gateway, Kong, Nginx, Traefik, Envoy.

#20Service Mesh là gì và tại sao nó cần thiết trong kiến trúc microservices? (What is a Service Mesh and why is it needed in microservices architecture?)

Trung Bình

Service Mesh là infrastructure layer xử lý service-to-service communication trong microservices, thường được implement qua sidecar proxy pattern (mỗi service pod có một proxy container đi kèm).

Service Mesh cung cấp: mTLS (mutual TLS) cho encrypted communication giữa services, load balancing, circuit breaking, retry logic, timeout, distributed tracing, metrics collection – tất cả mà không cần sửa application code.
Vấn đề Service Mesh giải quyết: khi có hàng chục microservices, implement networking concerns trong mỗi service (bằng library) rất khó maintain, tốn công, dễ inconsistent.
Service Mesh chuyển những concerns này ra infrastructure layer.
Istio (với Envoy sidecar) và Linkerd là hai giải pháp phổ biến nhất.
Trade-off: operational complexity rất cao, Istio đặc biệt nặng và có learning curve lớn – chỉ phù hợp khi có đủ platform engineering team.
Nhiều tổ chức chọn giải pháp đơn giản hơn như Consul Connect hoặc AWS App Mesh.

#21Circuit Breaker Pattern là gì? Nó ngăn chặn cascading failures như thế nào? (What is the Circuit Breaker Pattern? How does it prevent cascading failures?)

Trung Bình

Circuit Breaker là pattern bảo vệ hệ thống khỏi cascading failures khi một dependency bị lỗi – lấy cảm hứng từ cầu dao điện.

Hoạt động qua 3 states: Closed (hoạt động bình thường, theo dõi failure rate), Open (sau khi failure rate vượt threshold, tất cả requests đều fail fast ngay lập tức mà không gọi service lỗi – cho service thời gian recover), Half-Open (sau timeout, cho một số requests thử đến service, nếu thành công thì đóng lại, nếu thất bại thì mở lại).

Ví dụ không có Circuit Breaker: Payment Service gọi Fraud Detection Service bị chậm → threads bị block chờ timeout → thread pool exhaustion → Payment Service cũng down → cascading failure lan rộng.

#22Saga Pattern giải quyết vấn đề gì trong microservices? Choreography vs Orchestration? (What problem does the Saga Pattern solve in microservices? Choreography vs Orchestration?)

Trung Bình

Trong microservices, mỗi service có database riêng, nên không thể dùng distributed transactions (2PC) – quá phức tạp và tạo tight coupling. Saga Pattern giải quyết bằng cách chia distributed transaction thành chuỗi local transactions, mỗi bước publish event; nếu một bước fail, thực hiện compensating transactions để rollback các bước trước. Ví dụ Order Saga: CreateOrder → ReserveInventory → ProcessPayment → ShipOrder; nếu ProcessPayment fail → UnreserveInventory → CancelOrder.

Choreography: mỗi service lắng nghe events và tự quyết định hành động tiếp theo – không có coordinator trung tâm, loosely coupled hơn nhưng khó theo dõi flow tổng thể, dễ tạo circular dependencies.
Orchestration: có một Saga Orchestrator trung tâm điều phối các bước, gửi commands đến từng service – dễ hiểu và debug flow, nhưng orchestrator có thể trở thành bottleneck và chứa nhiều business logic.

Temporal.io, AWS Step Functions là công cụ hỗ trợ orchestration saga.

#23Event-Driven Architecture là gì? Lợi ích và các thách thức khi implement? (What is Event-Driven Architecture? Benefits and challenges when implementing?)

Trung Bình

Event-Driven Architecture (EDA) là kiến trúc nơi các components giao tiếp với nhau qua events (notifications về điều gì đó đã xảy ra) thay vì direct API calls. Producer emit events mà không biết consumer là ai; consumer subscribe và xử lý events async – đạt được loose coupling cao.

Lợi ích: producers và consumers hoàn toàn độc lập (có thể deploy, scale, fail độc lập), dễ thêm consumer mới mà không sửa producer, natural fit cho audit logging và event sourcing, handle high throughput tốt qua buffering. Thách thức: debugging và tracing khó hơn vì flow phi tuyến tính; eventual consistency – consumer có thể xử lý chậm hơn producer; ordering guarantees phức tạp (Kafka đảm bảo ordering trong partition); idempotency – consumer cần handle duplicate events; schema evolution khi event format thay đổi. Patterns quan trọng: dead letter queue cho failed events, event schema registry (Confluent Schema Registry), outbox pattern để đảm bảo atomicity giữa DB write và event publish.

#24Kafka và RabbitMQ khác nhau như thế nào? Khi nào dùng mỗi loại? (How do Kafka and RabbitMQ differ? When to use each?)

Trung Bình

RabbitMQ là traditional message broker (queue-based): messages được push đến consumers, sau khi consumer acknowledge thì message bị xóa, hỗ trợ nhiều messaging patterns (pub/sub, point-to-point, routing với exchanges), tốt cho task queues và RPC. Kafka là distributed event log (log-based): messages được append vào log và retained theo thời gian (không xóa sau consume), consumers tự pull và track offset của mình, hỗ trợ replay messages, designed cho high-throughput (hàng triệu messages/second).

Chọn RabbitMQ khi:
- cần complex routing logic, task distribution, message TTL và priority
- tích hợp với nhiều protocols (AMQP, STOMP, MQTT)
- team nhỏ cần dễ ops

Chọn Kafka khi:
- event streaming và event sourcing, audit log cần retention lâu dài
- multiple consumers cần đọc cùng message độc lập
- high throughput analytics pipeline, real-time data integration

#25Serverless Architecture là gì? Ưu nhược điểm và khi nào phù hợp? (What is Serverless Architecture? Pros, cons, and when is it appropriate?)

Trung Bình

Serverless (FaaS – Function as a Service) là mô hình cloud computing nơi developer chỉ viết code (functions), không quản lý server infrastructure; provider tự động provision, scale, và bill theo actual invocations (pay-per-use). AWS Lambda, Google Cloud Functions, Vercel Functions là các giải pháp phổ biến.

Ưu điểm: zero infrastructure management, auto-scaling từ 0 đến hàng nghìn instances ngay lập tức, cost-effective cho sporadic/unpredictable traffic (không trả tiền khi idle), giảm operational overhead.

Nhược điểm: Cold Start latency — Node/Python thường < 100ms P50 nhờ AWS Lambda Provisioned Concurrency và SnapStart (GA 2023 cho Java), nhưng vẫn là vấn đề nếu không dùng tính năng này; execution time limit (Lambda tối đa 15 phút); vendor lock-in; stateless (mỗi invocation độc lập, phải dùng external store); không phù hợp cho long-running processes.

Phù hợp cho: webhooks, scheduled jobs (cron), event-driven processing (S3 trigger, SQS), APIs với variable traffic. Không phù hợp cho: latency-sensitive real-time APIs cần cold start <10ms, stateful applications.

#26Khi nào chọn SQL và khi nào chọn NoSQL? Các yếu tố quyết định? (When to choose SQL vs NoSQL? What are the deciding factors?)

Trung Bình

SQL (Relational DB): dùng khi cần ACID transactions (tài chính, đặt hàng), data có structure rõ ràng và ổn định, cần complex queries với JOINs. PostgreSQL, MySQL là lựa chọn mặc định tốt cho hầu hết applications.

NoSQL chia thành nhiều loại:
- Document DB (MongoDB, Firestore) – flexible schema, tốt cho content management, user profiles
- Key-Value (Redis, DynamoDB) – cực nhanh, tốt cho caching, session, leaderboards
- Column-family (Cassandra, HBase) – write-heavy workloads, time-series, IoT data ở scale lớn
- Graph DB (Neo4j) – relationship-heavy queries, social networks, fraud detection

Quyết định không chỉ là SQL vs NoSQL mà là chọn đúng database cho use case. Polyglot persistence – dùng nhiều database types trong cùng hệ thống – là approach của các hệ thống lớn (Netflix dùng MySQL + Cassandra + Elasticsearch + Redis).

#27Database Indexing hoạt động như thế nào? Các loại index và khi nào nên dùng? (How does Database Indexing work? Types of indexes and when to use them?)

Trung Bình

Index là data structure (thường là B-Tree) cho phép database tìm kiếm records nhanh mà không cần full table scan – giảm query time từ O(n) xuống O(log n).

B-Tree Index: phổ biến nhất, hỗ trợ equality và range queries (=, <, >, BETWEEN, LIKE 'prefix%'), tốt cho high-cardinality columns.
Hash Index: cực nhanh cho equality lookups (=) nhưng không hỗ trợ range queries, dùng trong memory-optimized tables.
Composite Index: index trên nhiều columns – thứ tự columns quan trọng (leftmost prefix rule); index (a,b,c) hỗ trợ queries trên (a), (a,b), (a,b,c) nhưng không hỗ trợ chỉ (b) hoặc (c).
Partial Index: chỉ index một subset của rows (ví dụ WHERE status='active') – nhỏ hơn và hiệu quả hơn.
Full-Text Index: cho text search.
GIN/GiST Index (PostgreSQL): cho array, JSONB, geometric data.
Covering Index: index chứa tất cả columns cần cho query – không cần đọc thêm table rows.

Trade-off: mỗi index tốn storage và làm chậm write (phải cập nhật index); không phải nhiều index là tốt hơn. Dùng EXPLAIN ANALYZE để hiểu query plan trước khi thêm index.

#28Normalization và Denormalization là gì? Trade-off và khi nào dùng mỗi kỹ thuật? (What are Normalization and Denormalization? Trade-offs and when to use each?)

Trung Bình

Normalization là quá trình tổ chức database để giảm data redundancy và dependency thông qua các Normal Forms (1NF, 2NF, 3NF, BCNF) – chia data thành nhiều tables liên quan, tránh duplicate data.

Lợi ích: storage hiệu quả, dễ maintain consistency khi update (chỉ cần update một chỗ), ít risk inconsistency.

Nhược điểm: cần nhiều JOINs để reconstruct data, JOINs tốn kém ở scale lớn. Denormalization là cố tình thêm redundant data để tăng read performance – ví dụ lưu username trong bảng posts thay vì JOIN với bảng users mỗi lần query.

Lợi ích: read queries nhanh hơn đáng kể, đơn giản hóa query.

Nhược điểm: duplicate data tốn storage, phức tạp khi update (phải update nhiều chỗ), risk inconsistency. Quyết định: OLTP (transaction processing) thường normalize để đảm bảo data integrity; OLAP (analytics, data warehouse) thường denormalize (star/snowflake schema) để tăng query speed. Với NoSQL Document DB, denormalization là mặc định – embed related data vào document nếu luôn được đọc cùng nhau.

#29Blob Storage là gì và khi nào dùng thay vì database? Thiết kế hệ thống lưu trữ file? (What is Blob Storage and when to use it instead of a database? How to design a file storage system?)

Trung Bình

Blob Storage là storage chuyên biệt cho unstructured data (images, video, docs) — dùng thay vì database để tránh làm DB backup lớn và ảnh hưởng query performance. AWS S3, Google Cloud Storage, Azure Blob Storage là các giải pháp phổ biến. Pattern đúng: lưu file lên Blob Storage, chỉ lưu metadata (URL, size, type, owner) trong database.

Thiết kế hệ thống upload file: Client → generate pre-signed URL từ backend → upload trực tiếp lên S3 (bypass server, tránh bandwidth bottleneck) → backend nhận callback/event để cập nhật DB. Tối ưu: dùng CDN (CloudFront) trước S3 để serve files nhanh cho users globally, enable S3 Transfer Acceleration cho upload quốc tế, dùng multipart upload cho files lớn (>100MB). Security: pre-signed URLs với expiration time, bucket policy không public, virus scanning với Lambda trigger. Storage tiers: S3 Standard → S3 IA (Infrequent Access) → S3 Glacier cho archival – giảm cost đáng kể.

#30Thiết kế hệ thống chat real-time như thế nào?

Nâng Cao

Hệ thống chat real-time cần dùng WebSocket để giao tiếp hai chiều giữa client và server, thay vì HTTP polling vì WebSocket giữ kết nối mở liên tục và có độ trễ thấp hơn nhiều.

Phía frontend gồm các components chính: MessageList (dùng virtualized list để hiển thị hàng nghìn tin nhắn mượt mà), MessageInput, và ChatSidebar hiển thị danh sách rooms. State management nên dùng Zustand hoặc Redux để quản lý messages và online status, kết hợp Optimistic UI hiển thị tin nhắn ngay lập tức trước khi server confirm để UX mượt hơn.

Cần có chiến lược reconnection tự động khi mất kết nối, lazy load tin nhắn cũ khi scroll lên, và dùng IndexedDB để cache messages offline giúp app vẫn hoạt động khi không có mạng.

#31Thiết kế infinite scroll feed (như Facebook/Twitter)?

Nâng Cao

Infinite scroll feed cần kết hợp nhiều kỹ thuật để vừa mượt vừa hiệu quả. Dùng Intersection Observer gắn vào một sentinel element ở cuối danh sách, khi element này xuất hiện trong viewport thì tự động trigger load thêm dữ liệu.

Để hiển thị hàng nghìn items mà không lag, dùng virtual list với react-window hoặc @tanstack/virtual để chỉ render những items đang nhìn thấy trên màn hình. Về phía API, nên dùng cursor-based pagination thay vì offset vì cursor không bị trùng lặp data khi có bài viết mới, kết hợp React Query useInfiniteQuery để quản lý cache và fetch states.

Ngoài ra cần xử lý restore scroll position khi user navigate đi rồi quay lại, hiển thị skeleton loading cho UX tốt hơn, và debounce scroll events để tránh gọi API quá nhiều lần.

#32Thiết kế form builder (drag & drop)?

Nâng Cao

Form builder cần hai chế độ chính: Edit mode cho phép kéo thả các field types vào canvas, và Preview mode hiển thị form như user cuối sẽ thấy. Phần drag and drop nên dùng @dnd-kit vì nó nhẹ, accessible, và hỗ trợ tốt React (react-beautiful-dnd không còn được maintain từ 2023).

Kiến trúc cốt lõi là schema-driven: mỗi form được biểu diễn dưới dạng JSON schema, và có một component registry để map từ field type sang React component tương ứng, giúp dễ mở rộng thêm loại field mới.

Các tính năng nâng cao gồm validation rules engine cho phép config validation mỗi field, undo/redo dùng command pattern để lưu lịch sử thao tác, và khả năng export form data cùng generate endpoint nhận submissions.

#33Data Partitioning là gì? Horizontal vs Vertical Partitioning và các chiến lược partition? (What is Data Partitioning? Horizontal vs Vertical Partitioning and partitioning strategies?)

Nâng Cao

Data Partitioning là chia một large table/dataset thành nhiều phần nhỏ hơn để cải thiện performance và manageability. Horizontal Partitioning (Sharding): chia rows – ví dụ users 1-1M vào partition 1, 1M-2M vào partition 2; mỗi partition có cùng schema. Vertical Partitioning: chia columns – ví dụ tách blob/text columns ít được đọc ra table riêng để hot data nằm cùng nhau trên disk, cải thiện cache efficiency. Partitioning strategies cho Horizontal: Range Partitioning (theo date range, ID range – tốt cho time-series, dễ archive old data); Hash Partitioning (hash của partition key – phân phối đều, tránh hot spots); List Partitioning (theo enumerated values, ví dụ country); Composite Partitioning (kết hợp nhiều strategies). PostgreSQL Table Partitioning là built-in solution: khai báo partition key, tự động route inserts và prune partitions khi query.

Lợi ích: partition pruning giảm data scanned, parallel query trên nhiều partitions, dễ archive/drop old partitions (DROP PARTITION nhanh hơn DELETE). Khác biệt với Sharding: partitioning thường trong cùng một database instance, sharding là across nhiều servers.

#34Time-Series Database là gì? Khi nào cần dùng và các giải pháp phổ biến? (What is a Time-Series Database? When is it needed and what are popular solutions?)

Nâng Cao

Time-Series Database (TSDB) là database được tối ưu đặc biệt cho dữ liệu có timestamp – metrics, sensor readings, financial ticks, logs, IoT data.

Đặc điểm: write-heavy (liên tục insert new data points), dữ liệu cũ ít được query, aggregation theo time windows (avg, sum, min/max trong khoảng thời gian), data retention policies (tự động xóa data cũ).
Vì sao RDBMS không tốt cho time-series: index B-Tree không hiệu quả cho sequential time-based writes, phải manually partition và archive old data.
TSDB tối ưu qua: columnar storage (compress cùng metric đến 10-100x), time-based partitioning built-in, downsampling (giảm resolution của old data), optimized aggregation functions.
InfluxDB: phổ biến nhất, có InfluxQL/Flux query language, built-in retention policies.
TimescaleDB: extension của PostgreSQL, dùng được SQL quen thuộc, hypertables tự động partition.
Prometheus + Grafana: stack tiêu chuẩn cho infrastructure monitoring.
Cassandra: cũng thường dùng cho time-series ở scale rất lớn (IoT).
Use cases: application metrics (Datadog, New Relic dùng TSDB), stock prices, server monitoring, IoT sensor data.

#35Data Lake và Data Warehouse khác nhau như thế nào? Khi nào dùng mỗi loại? (How do Data Lake and Data Warehouse differ? When to use each?)

Nâng Cao

Data Warehouse là repository lưu structured, processed data được tổ chức theo schema cụ thể (star/snowflake schema) cho business intelligence và SQL analytics – data được ETL (Extract, Transform, Load) trước khi load vào.

Ví dụ: Amazon Redshift, Google BigQuery, Snowflake. Data Lake là repository lưu raw data ở bất kỳ format nào (structured, semi-structured, unstructured) ở quy mô massive – schema được áp dụng khi đọc (schema-on-read) thay vì khi ghi.

Ví dụ: AWS S3 + Glue + Athena, Azure Data Lake, Hadoop HDFS. Data Warehouse dùng khi: BI dashboards, regular business reports, data analysts cần SQL queries dễ dàng, data quality quan trọng. Data Lake dùng khi: data science và ML cần raw data, lưu trữ tất cả data để phân tích sau (không biết trước cần gì), log files, clickstream data. Data Lakehouse là trend mới (Databricks Delta Lake, Apache Iceberg) kết hợp cả hai: lưu raw data trong object storage nhưng có ACID transactions, schema enforcement, và query performance tốt như warehouse.

#36Change Data Capture (CDC) là gì? Cách hoạt động và use cases? (What is Change Data Capture (CDC)? How it works and use cases?)

Nâng Cao

CDC là kỹ thuật theo dõi và capture mọi thay đổi (INSERT, UPDATE, DELETE) trong database và stream những thay đổi đó ra các system khác gần như real-time – thay vì polling database định kỳ.

Cách hoạt động phổ biến nhất: Log-based CDC đọc database transaction log (WAL trong PostgreSQL, binlog trong MySQL) – non-intrusive, không ảnh hưởng production write performance, capture tất cả changes kể cả DELETE.
Debezium là open-source CDC platform phổ biến nhất, kết nối với PostgreSQL/MySQL/MongoDB và stream changes vào Kafka.
Use cases: cache invalidation (khi DB thay đổi → invalidate Redis cache ngay lập tức); data synchronization (sync dữ liệu từ OLTP sang data warehouse real-time thay vì batch ETL hàng đêm); microservices event sourcing (DB change → event); Elasticsearch sync (full-text search index luôn up-to-date); audit logging.
Lợi thế so với polling: lower latency (seconds thay vì minutes/hours), ít load hơn cho DB, không miss delete events.
Outbox Pattern là cách đảm bảo CDC reliability: write changes vào outbox table trong cùng transaction, CDC đọc từ outbox.

#37Thiết kế hệ thống URL Shortener (như bit.ly). Các thành phần chính và quyết định kỹ thuật? (Design a URL Shortener system like bit.ly. Key components and technical decisions?)

Nâng Cao

Requirements: tạo short URL từ long URL, redirect từ short → long, ~100M URLs/day (write), ~1B redirects/day (read, read-heavy 10:1).

Hash Generation: dùng Base62 encoding (a-zA-Z0-9) trên 7 ký tự = 62^7 ≈ 3.5 nghìn tỷ unique URLs; tránh MD5/SHA vì collision; thay vào đó dùng auto-increment ID convert sang Base62.
Database: lưu short_code → long_url mapping; read-heavy nên cần caching aggressive; có thể dùng Cassandra (scale tốt) hoặc MySQL/PostgreSQL với Redis cache.
Cache: 80% traffic chỉ đến 20% URLs (hot URLs) → cache top URLs trong Redis với LRU eviction, cache hit rate rất cao.
Redirect: 301 (permanent, browser cache – ít load server nhưng không track analytics) vs 302 (temporary, browser không cache – track được mỗi click).

Architecture: API Server stateless → Redis cache → Database; Rate limiting để tránh abuse; Custom domain support cần DNS wildcard; Analytics pipeline: click → Kafka → Spark → analytics DB. Scale: phân tách read service (redirect) và write service (create) vì load pattern khác nhau.

#38Thiết kế hệ thống Chat real-time (như WhatsApp/Slack). Làm thế nào để handle kết nối và tin nhắn? (Design a real-time chat system like WhatsApp/Slack. How to handle connections and messages?)

Nâng Cao

Requirements: real-time messaging, online/offline status, message history, 1-1 và group chat.

Connection layer: WebSocket là lựa chọn tốt hơn long polling vì bi-directional, persistent connection, low latency – mỗi client maintain một WebSocket connection đến chat server.
Cross-server routing: users kết nối đến different servers – cần pub/sub layer (Redis Pub/Sub hoặc Kafka): Server A nhận message cho User B đang connect đến Server B → publish lên Redis → Server B subscribe và push đến User B.
Message storage: Cassandra (HBase ở Facebook Messenger, Cassandra ở Discord) vì write-heavy, time-based access pattern. Schema: partition key là (conversation_id), clustering key là (timestamp, message_id).
Offline messages: nếu user offline khi nhận message, lưu vào DB; khi reconnect, fetch unread messages.
Status service: heart-beat mỗi 5s để track online status, lưu trong Redis với TTL.
Fanout cho group chat: fanout-on-write (push đến tất cả members) vs fanout-on-read (members pull khi cần); với group lớn, fanout-on-write tốn kém → giới hạn group size hoặc hybrid approach.

#39Thiết kế hệ thống Notification (push/email/SMS notifications). Các thành phần và đảm bảo delivery? (Design a Notification System (push/email/SMS). Components and ensuring delivery?)

Nâng Cao

Requirements: multi-channel (push, email, SMS), high volume (hàng trăm triệu notifications/day), reliable delivery, user preferences.

Architecture: Producer Services → Notification Service → Channel Handlers → Third-party providers.

Notification Service: nhận events (order shipped, friend request), lookup user preferences (channel, quiet hours, opt-out), enqueue vào Kafka với separate topics per channel.
Channel Workers: Push Notification Worker gọi FCM (Android)/APNs (iOS); Email Worker gọi SendGrid/SES; SMS Worker gọi Twilio/SNS.
Reliability: at-least-once delivery via Kafka; lưu notification vào DB với status (pending/sent/failed); retry with exponential backoff; dead letter queue cho permanent failures.
Rate limiting per user: không spam user với 100 notifications cùng lúc – aggregate/throttle.
Notification template service: versioned templates với i18n support.
User preference service: per-channel opt-in/out, quiet hours, digest mode.
Monitoring: delivery rate per channel, bounce/unsubscribe tracking, latency P99.

Scale bottleneck thường ở third-party API calls – cần circuit breakers và fallback providers. Idempotency key để tránh duplicate notifications khi retry.

#40Thiết kế Rate Limiter phân tán cho API. Các yêu cầu và giải pháp kỹ thuật? (Design a distributed Rate Limiter for an API. Requirements and technical solutions?)

Nâng Cao

Distributed rate limiter cần shared atomic counter — Redis (với Lua scripts) là giải pháp tiêu chuẩn, với local-global hybrid như là lựa chọn lower-latency.

Approach 1 – Centralized Redis: dùng Redis Lua script (atomic operations) để implement sliding window counter hoặc token bucket. Redis INCR + EXPIRE cho fixed window; sorted set với ZADD/ZREMRANGEBYSCORE cho sliding window log. Ưu: centralized, chính xác; Nhược: Redis là single point of failure (giải quyết bằng Redis Cluster), mỗi request phải gọi Redis (+latency). Approach 2 – Token Bucket với Redis: lưu (tokens, last_refill_time) per user trong Redis; mỗi request atomic update via Lua script; refill tokens theo elapsed time. Approach 3 – Local + Global: mỗi API server có local counter, sync với Redis định kỳ – giảm Redis calls nhưng ít chính xác hơn (có thể vượt limit tạm thời). Response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After. Multi-level rate limiting: per IP, per user, per endpoint, per API key – với different limits. Distributed Rate Limiting phức tạp hơn vì không có shared memory – Redis là giải pháp thực tế nhất; nếu Redis down, circuit breaker để decide fail-open (allow all) hay fail-closed (block all).

#41Thiết kế News Feed (như Facebook/Twitter). Fanout strategies và caching? (Design a News Feed like Facebook/Twitter. Fanout strategies and caching?)

Nâng Cao

Requirements: user thấy posts từ người họ follow, realtime updates, pagination, ~500M users.

Core challenge: khi user A post, tất cả followers của A cần thấy post đó trong feed.

Fanout-on-write (Push model): immediately push vào feed cache của tất cả followers → feed read rất nhanh, nhưng write amplification lớn: user có 1M followers → 1M cache writes.
Fanout-on-read (Pull model): khi user load feed, query tất cả người họ follow, merge và sort → không có write overhead, nhưng read rất chậm và expensive.
Hybrid approach (Facebook/Twitter): fanout-on-write cho users thường (< N followers), fanout-on-read cho celebrities (> N followers); merge pre-computed feed + real-time pull từ celebrities.

Feed Storage: Redis sorted set với timestamp là score, post_id là member – ZREVRANGE để paginate; TTL để evict old feeds.
Post storage: separate service, fetch post content từ DB/cache khi render feed.
Ranking: chronological là đơn giản nhất; ML-based ranking (engagement prediction) phức tạp hơn nhưng giữ user lâu hơn.

Cursor-based pagination thay vì offset pagination để tránh missing/duplicate items khi feed thay đổi.

#42Thiết kế hệ thống File Storage như Google Drive hoặc Dropbox. Các thành phần chính? (Design a file storage system like Google Drive or Dropbox. Key components?)

Nâng Cao

Requirements: upload/download/sync files, share với others, version history, ~1B users, ~10 exabytes storage.

Chunking: chia file thành chunks (4-8MB), mỗi chunk được hash (SHA-256) để detect duplicates (deduplication) và support delta sync (chỉ upload chunks thay đổi).
Upload flow: Client chunker → tính hash của mỗi chunk → gửi chunk hashes lên server → server trả lại chunks nào cần upload → client upload missing chunks lên Blob Storage (S3) → server ghi metadata vào DB.
Metadata DB: lưu file tree structure, ownership, permissions, version history – dùng RDBMS (MySQL) cho ACID và complex queries.
Blob Storage: S3-compatible object storage cho raw file chunks.
Deduplication: nếu hai users upload cùng file, chỉ lưu một bản vật lý, tham chiếu từ nhiều users – tiết kiệm storage đáng kể.
Sync service: khi file thay đổi trên device A → upload delta chunks → notify other devices qua WebSocket/long polling → devices download changed chunks.
Conflict resolution: last-write-wins hoặc tạo conflict copy như Dropbox.

Bandwidth optimization: client-side deduplication và delta sync giảm upload data tới 90%. Permission model: owner, editor, viewer; sharing links với expiry. CDN cho download popular files. File metadata search dùng Elasticsearch.

#43Thiết kế Search Autocomplete (Typeahead Suggestions). Tối ưu latency và ranking? (Design a Search Autocomplete / Typeahead system. How to optimize latency and ranking?)

Nâng Cao

Requirements: suggest top-K queries khi user gõ, latency < 100ms, millions of users.

Data collection: log tất cả search queries; aggregation job (Hadoop/Spark) chạy weekly/daily để tính frequency của mỗi query (hay trending score).
Trie (Prefix Tree): data structure lý tưởng cho prefix matching – mỗi node đại diện một ký tự, lưu top-K queries tại mỗi node. Query time: traverse trie theo prefix → O(prefix_length).
Trie storage: serialized trie lưu trong distributed cache (Redis).
Caching: top-20% prefixes chiếm ~80% traffic → cache aggressively; prefix cache với TTL 24h; browser cache ở client để giảm requests.
Ranking factors: frequency, freshness (trending queries), personalization (user's search history).
Trie update: không update real-time (expensive); rebuild offline và swap atomically.
Scale: shard trie theo first character (26 shards); consistent hashing để route request đến đúng shard.

Filter: cần filter offensive/spam queries trước khi hiển thị. API: GET /autocomplete?q={prefix}&limit=10 → cần < 50ms để UX tốt.

#44Thiết kế hệ thống Payment như Stripe. Đảm bảo tính chính xác và idempotency? (Design a payment system like Stripe. Ensuring correctness and idempotency?)

Nâng Cao

Requirements: process payments, prevent double charges, handle failures gracefully, PCI compliance, audit trail đầy đủ.

Idempotency (quan trọng nhất): client gửi Idempotency-Key (UUID) theo mỗi request; server lưu {idempotency_key → response} trong DB; nếu cùng key gửi lại (do retry), trả về cached response mà không xử lý lại – ngăn double charge tuyệt đối.
Payment flow: Create PaymentIntent (created) → Client collect card info (Stripe.js không gửi card data đến server – PCI scope reduction) → Confirm payment → Server gọi payment processor → Update trạng thái.
State machine: created → processing → succeeded/failed/refunded – mọi transition được log với Event Sourcing.
Outbox Pattern: ghi payment record và outbox event trong cùng DB transaction; worker đọc outbox và gọi external API, update status khi xong.
Reconciliation: hàng đêm so sánh internal records với statement từ bank/processor để phát hiện discrepancy.
Security: TLS everywhere, no log card data, tokenization (lưu token thay vì raw card number), fraud detection ML model.
Retry strategy: exponential backoff với jitter cho transient failures; không retry idempotent operations mà không có idempotency key.

Compliance: PCI-DSS, SOC2, GDPR data retention policies.

#1CAP Theorem là gì và tại sao nó quan trọng trong thiết kế hệ thống phân tán? (What is the CAP Theorem and why does it matter in distributed system design?)

Cơ Bản

#2ACID và BASE khác nhau như thế nào? Khi nào dùng mỗi mô hình? (How do ACID and BASE differ, and when should each model be used?)

Cơ Bản

#3Vertical Scaling và Horizontal Scaling là gì? Ưu nhược điểm của từng loại? (What are Vertical and Horizontal Scaling? What are the pros and cons of each?)

Cơ Bản

#4Load Balancing hoạt động như thế nào? Các thuật toán load balancing phổ biến là gì? (How does Load Balancing work? What are the common load balancing algorithms?)

Cơ Bản

Các thuật toán phổ biến: Round Robin (luân phiên tuần tự – đơn giản nhưng không quan tâm trọng tải thực tế), Weighted Round Robin (gán trọng số theo capacity), Least Connections (gửi đến server ít connection nhất – tốt khi request có thời gian xử lý khác nhau), IP Hash (hash IP client để cùng client luôn đến cùng server – hữu ích cho session stickiness), Least Response Time (chọn server nhanh nhất).
Layer 4 LB hoạt động ở transport layer (TCP/UDP), nhanh hơn nhưng ít thông minh hơn; Layer 7 LB hoạt động ở application layer, có thể route dựa trên URL, header, cookie – linh hoạt hơn nhưng overhead cao hơn.
AWS ALB, Nginx, HAProxy là các giải pháp phổ biến.

#5CDN là gì và nó cải thiện hiệu năng hệ thống như thế nào? (What is a CDN and how does it improve system performance?)

Cơ Bản

Khi user request một file, CDN route request đến edge node gần nhất thay vì origin server, giảm latency đáng kể – ví dụ user ở Việt Nam truy cập CDN node Singapore thay vì origin server ở US.
Ngoài latency, CDN còn giảm tải cho origin server, tăng throughput toàn cầu, có built-in DDoS protection, và cải thiện availability (cache vẫn serve được khi origin tạm thời down).
CDN phù hợp nhất cho static assets và video streaming; dynamic content cũng có thể cache nếu dùng Edge Computing (Cloudflare Workers, Vercel Edge).
Các CDN lớn: Cloudflare, AWS CloudFront, Akamai, Fastly.

#6Forward Proxy và Reverse Proxy khác nhau như thế nào? Mỗi loại dùng trong trường hợp nào? (How do Forward Proxy and Reverse Proxy differ? When is each used?)

Cơ Bản

Forward Proxy đứng phía trước client, đại diện cho client gửi request ra ngoài – client biết về proxy, nhưng server ngoài không biết request thực sự đến từ đâu.

Dùng để: bypass geo-restriction, ẩn IP client, filter nội dung trong corporate network, caching outbound requests (giảm bandwidth).
Reverse Proxy đứng phía trước server, đại diện cho server nhận request từ client – client nghĩ mình đang nói chuyện với server thực, không biết có proxy ở giữa.
Dùng để: load balancing, SSL termination, caching, rate limiting, authentication, ẩn cấu trúc internal network.
Nginx và HAProxy thường đóng vai trò reverse proxy trong production; Squid là forward proxy phổ biến.
API Gateway về bản chất là một reverse proxy chuyên biệt với thêm tính năng như auth, routing, transformation.

#7Latency và Throughput là gì? Tại sao chúng thường có trade-off với nhau? (What are Latency and Throughput? Why do they often trade off against each other?)

Cơ Bản

#8Các mô hình Consistency trong hệ thống phân tán là gì? (What are the consistency models in distributed systems?)

Cơ Bản

Các mô hình consistency phổ biến:

Strong Consistency (Linearizability): sau khi write thành công, mọi read sau đó đều thấy giá trị mới nhất – dễ lập trình nhất nhưng latency cao nhất vì cần coordination giữa các node.
Eventual Consistency: dữ liệu sẽ nhất quán sau một khoảng thời gian, nhưng tạm thời có thể đọc dữ liệu cũ – được dùng trong Cassandra, DynamoDB, DNS.
Read-your-writes Consistency: sau khi user write, chính user đó luôn đọc được dữ liệu mới nhất (dù user khác chưa thấy) – quan trọng cho UX tốt trong social media.
Causal Consistency: các operation có quan hệ nhân quả được nhìn thấy theo đúng thứ tự (A post → B comment → mọi người thấy comment sau post).
Monotonic Read: user không bao giờ thấy dữ liệu quay ngược thời gian (không đọc v2 rồi đọc v1).

Lựa chọn consistency model ảnh hưởng trực tiếp đến latency, availability và trải nghiệm người dùng.

#9Caching strategies cho frontend app?

Trung Bình

#10Database Sharding là gì? Các chiến lược sharding phổ biến và khi nào nên dùng? (What is Database Sharding? What are common sharding strategies and when should you use it?)

Trung Bình

#11Read Replica là gì và nó giúp scale database như thế nào? Có những hạn chế nào? (What are Read Replicas and how do they help scale a database? What are the limitations?)

Trung Bình

#12Cache-aside, Write-through, và Write-back caching khác nhau như thế nào? (How do Cache-aside, Write-through, and Write-back caching strategies differ?)

Trung Bình

Cache-aside (Lazy Loading): application tự quản lý cache – trước tiên check cache, nếu miss thì đọc từ DB và populate cache (cache-aside pattern).

Ưu: chỉ cache dữ liệu thực sự được đọc, cache failure không block read; Nhược: cache miss đầu tiên luôn có extra latency, dữ liệu cache có thể stale nếu DB được update trực tiếp.
Write-through: mỗi write đồng thời cập nhật cả cache và DB trước khi trả response.
Ưu: cache luôn fresh, không bao giờ stale; Nhược: write latency cao hơn, cache dữ liệu ít được đọc lãng phí memory.
Write-back (Write-behind): write chỉ vào cache trước, trả response ngay, sau đó async flush xuống DB.
Ưu: write latency rất thấp, tốt cho write-heavy workloads; Nhược: data loss risk nếu cache crash trước khi flush, phức tạp hơn.
Redis thường dùng cho cả 3 pattern; cache-aside là phổ biến nhất trong thực tế vì đơn giản và phù hợp hầu hết use case.

#13Rate Limiting là gì? Các thuật toán rate limiting phổ biến và cách implement? (What is Rate Limiting? What are common algorithms and how to implement it?)

Trung Bình

Rate Limiting là kỹ thuật kiểm soát tần suất request từ một client/IP/user để bảo vệ hệ thống khỏi abuse, DDoS, và đảm bảo fair usage.

#14Connection Pooling là gì và tại sao nó quan trọng cho database performance? (What is Connection Pooling and why is it critical for database performance?)

Trung Bình

#15CQRS Pattern là gì? Khi nào nên áp dụng và những thách thức gì khi implement? (What is the CQRS Pattern? When should it be applied and what are the challenges?)

Trung Bình

#16Event Sourcing là gì? Lợi ích và hạn chế so với traditional state storage? (What is Event Sourcing? Benefits and limitations compared to traditional state storage?)

Trung Bình

Ví dụ: thay vì lưu account.balance = 1000, lưu [Deposited(500), Deposited(700), Withdrew(200)] – số dư là kết quả replay của events.

Trung Bình

Stateful services yêu cầu externalize state; các lựa chọn chính là Sticky Sessions, Externalized Store (Redis), và JWT — mỗi cách có trade-off riêng về complexity và failover.

Sticky Sessions: load balancer route cùng user về cùng server dựa trên cookie — đơn giản nhưng tạo uneven load và failover khó.
Externalized Session Store: lưu session trong Redis/Memcached thay vì in-memory — mọi instance đọc được, dễ scale và failover.
JWT Tokens: encode state vào token, server không lưu state — hoàn toàn stateless nhưng không thể revoke trước khi expire (giải quyết bằng blacklist).
Database-backed Sessions: lưu session trong DB — đơn giản nhưng chậm hơn Redis.

Thực tế: JWT cho authentication + Redis cho session data ngắn hạn là pattern phổ biến nhất.

#18Microservices và Monolith khác nhau như thế nào? Khi nào nên migrate sang Microservices? (How do Microservices and Monolith differ? When should you migrate to Microservices?)

Trung Bình

#19API Gateway là gì? Vai trò và các tính năng chính của nó trong microservices architecture? (What is an API Gateway? Its role and key features in a microservices architecture?)

Trung Bình

#20Service Mesh là gì và tại sao nó cần thiết trong kiến trúc microservices? (What is a Service Mesh and why is it needed in microservices architecture?)

Trung Bình

Service Mesh cung cấp: mTLS (mutual TLS) cho encrypted communication giữa services, load balancing, circuit breaking, retry logic, timeout, distributed tracing, metrics collection – tất cả mà không cần sửa application code.
Vấn đề Service Mesh giải quyết: khi có hàng chục microservices, implement networking concerns trong mỗi service (bằng library) rất khó maintain, tốn công, dễ inconsistent.
Service Mesh chuyển những concerns này ra infrastructure layer.
Istio (với Envoy sidecar) và Linkerd là hai giải pháp phổ biến nhất.
Trade-off: operational complexity rất cao, Istio đặc biệt nặng và có learning curve lớn – chỉ phù hợp khi có đủ platform engineering team.
Nhiều tổ chức chọn giải pháp đơn giản hơn như Consul Connect hoặc AWS App Mesh.

#21Circuit Breaker Pattern là gì? Nó ngăn chặn cascading failures như thế nào? (What is the Circuit Breaker Pattern? How does it prevent cascading failures?)

Trung Bình

Circuit Breaker là pattern bảo vệ hệ thống khỏi cascading failures khi một dependency bị lỗi – lấy cảm hứng từ cầu dao điện.

Hoạt động qua 3 states: Closed (hoạt động bình thường, theo dõi failure rate), Open (sau khi failure rate vượt threshold, tất cả requests đều fail fast ngay lập tức mà không gọi service lỗi – cho service thời gian recover), Half-Open (sau timeout, cho một số requests thử đến service, nếu thành công thì đóng lại, nếu thất bại thì mở lại).

#22Saga Pattern giải quyết vấn đề gì trong microservices? Choreography vs Orchestration? (What problem does the Saga Pattern solve in microservices? Choreography vs Orchestration?)

Trung Bình

Choreography: mỗi service lắng nghe events và tự quyết định hành động tiếp theo – không có coordinator trung tâm, loosely coupled hơn nhưng khó theo dõi flow tổng thể, dễ tạo circular dependencies.
Orchestration: có một Saga Orchestrator trung tâm điều phối các bước, gửi commands đến từng service – dễ hiểu và debug flow, nhưng orchestrator có thể trở thành bottleneck và chứa nhiều business logic.

Temporal.io, AWS Step Functions là công cụ hỗ trợ orchestration saga.

#23Event-Driven Architecture là gì? Lợi ích và các thách thức khi implement? (What is Event-Driven Architecture? Benefits and challenges when implementing?)

Trung Bình

#24Kafka và RabbitMQ khác nhau như thế nào? Khi nào dùng mỗi loại? (How do Kafka and RabbitMQ differ? When to use each?)

Trung Bình

Chọn RabbitMQ khi:
- cần complex routing logic, task distribution, message TTL và priority
- tích hợp với nhiều protocols (AMQP, STOMP, MQTT)
- team nhỏ cần dễ ops

#25Serverless Architecture là gì? Ưu nhược điểm và khi nào phù hợp? (What is Serverless Architecture? Pros, cons, and when is it appropriate?)

Trung Bình

#26Khi nào chọn SQL và khi nào chọn NoSQL? Các yếu tố quyết định? (When to choose SQL vs NoSQL? What are the deciding factors?)

Trung Bình

#27Database Indexing hoạt động như thế nào? Các loại index và khi nào nên dùng? (How does Database Indexing work? Types of indexes and when to use them?)

Trung Bình

Index là data structure (thường là B-Tree) cho phép database tìm kiếm records nhanh mà không cần full table scan – giảm query time từ O(n) xuống O(log n).

B-Tree Index: phổ biến nhất, hỗ trợ equality và range queries (=, <, >, BETWEEN, LIKE 'prefix%'), tốt cho high-cardinality columns.
Hash Index: cực nhanh cho equality lookups (=) nhưng không hỗ trợ range queries, dùng trong memory-optimized tables.
Composite Index: index trên nhiều columns – thứ tự columns quan trọng (leftmost prefix rule); index (a,b,c) hỗ trợ queries trên (a), (a,b), (a,b,c) nhưng không hỗ trợ chỉ (b) hoặc (c).
Partial Index: chỉ index một subset của rows (ví dụ WHERE status='active') – nhỏ hơn và hiệu quả hơn.
Full-Text Index: cho text search.
GIN/GiST Index (PostgreSQL): cho array, JSONB, geometric data.
Covering Index: index chứa tất cả columns cần cho query – không cần đọc thêm table rows.

#28Normalization và Denormalization là gì? Trade-off và khi nào dùng mỗi kỹ thuật? (What are Normalization and Denormalization? Trade-offs and when to use each?)

Trung Bình

Lợi ích: storage hiệu quả, dễ maintain consistency khi update (chỉ cần update một chỗ), ít risk inconsistency.

Lợi ích: read queries nhanh hơn đáng kể, đơn giản hóa query.

Trung Bình

#30Thiết kế hệ thống chat real-time như thế nào?

Nâng Cao

#31Thiết kế infinite scroll feed (như Facebook/Twitter)?

Nâng Cao

#32Thiết kế form builder (drag & drop)?

Nâng Cao

#33Data Partitioning là gì? Horizontal vs Vertical Partitioning và các chiến lược partition? (What is Data Partitioning? Horizontal vs Vertical Partitioning and partitioning strategies?)

Nâng Cao

#34Time-Series Database là gì? Khi nào cần dùng và các giải pháp phổ biến? (What is a Time-Series Database? When is it needed and what are popular solutions?)

Nâng Cao

Time-Series Database (TSDB) là database được tối ưu đặc biệt cho dữ liệu có timestamp – metrics, sensor readings, financial ticks, logs, IoT data.

Đặc điểm: write-heavy (liên tục insert new data points), dữ liệu cũ ít được query, aggregation theo time windows (avg, sum, min/max trong khoảng thời gian), data retention policies (tự động xóa data cũ).
Vì sao RDBMS không tốt cho time-series: index B-Tree không hiệu quả cho sequential time-based writes, phải manually partition và archive old data.
TSDB tối ưu qua: columnar storage (compress cùng metric đến 10-100x), time-based partitioning built-in, downsampling (giảm resolution của old data), optimized aggregation functions.
InfluxDB: phổ biến nhất, có InfluxQL/Flux query language, built-in retention policies.
TimescaleDB: extension của PostgreSQL, dùng được SQL quen thuộc, hypertables tự động partition.
Prometheus + Grafana: stack tiêu chuẩn cho infrastructure monitoring.
Cassandra: cũng thường dùng cho time-series ở scale rất lớn (IoT).
Use cases: application metrics (Datadog, New Relic dùng TSDB), stock prices, server monitoring, IoT sensor data.

#35Data Lake và Data Warehouse khác nhau như thế nào? Khi nào dùng mỗi loại? (How do Data Lake and Data Warehouse differ? When to use each?)

Nâng Cao

#36Change Data Capture (CDC) là gì? Cách hoạt động và use cases? (What is Change Data Capture (CDC)? How it works and use cases?)

Nâng Cao

Cách hoạt động phổ biến nhất: Log-based CDC đọc database transaction log (WAL trong PostgreSQL, binlog trong MySQL) – non-intrusive, không ảnh hưởng production write performance, capture tất cả changes kể cả DELETE.
Debezium là open-source CDC platform phổ biến nhất, kết nối với PostgreSQL/MySQL/MongoDB và stream changes vào Kafka.
Use cases: cache invalidation (khi DB thay đổi → invalidate Redis cache ngay lập tức); data synchronization (sync dữ liệu từ OLTP sang data warehouse real-time thay vì batch ETL hàng đêm); microservices event sourcing (DB change → event); Elasticsearch sync (full-text search index luôn up-to-date); audit logging.
Lợi thế so với polling: lower latency (seconds thay vì minutes/hours), ít load hơn cho DB, không miss delete events.
Outbox Pattern là cách đảm bảo CDC reliability: write changes vào outbox table trong cùng transaction, CDC đọc từ outbox.

Nâng Cao

Requirements: tạo short URL từ long URL, redirect từ short → long, ~100M URLs/day (write), ~1B redirects/day (read, read-heavy 10:1).

Hash Generation: dùng Base62 encoding (a-zA-Z0-9) trên 7 ký tự = 62^7 ≈ 3.5 nghìn tỷ unique URLs; tránh MD5/SHA vì collision; thay vào đó dùng auto-increment ID convert sang Base62.
Database: lưu short_code → long_url mapping; read-heavy nên cần caching aggressive; có thể dùng Cassandra (scale tốt) hoặc MySQL/PostgreSQL với Redis cache.
Cache: 80% traffic chỉ đến 20% URLs (hot URLs) → cache top URLs trong Redis với LRU eviction, cache hit rate rất cao.
Redirect: 301 (permanent, browser cache – ít load server nhưng không track analytics) vs 302 (temporary, browser không cache – track được mỗi click).

Nâng Cao

Requirements: real-time messaging, online/offline status, message history, 1-1 và group chat.

Connection layer: WebSocket là lựa chọn tốt hơn long polling vì bi-directional, persistent connection, low latency – mỗi client maintain một WebSocket connection đến chat server.
Cross-server routing: users kết nối đến different servers – cần pub/sub layer (Redis Pub/Sub hoặc Kafka): Server A nhận message cho User B đang connect đến Server B → publish lên Redis → Server B subscribe và push đến User B.
Message storage: Cassandra (HBase ở Facebook Messenger, Cassandra ở Discord) vì write-heavy, time-based access pattern. Schema: partition key là (conversation_id), clustering key là (timestamp, message_id).
Offline messages: nếu user offline khi nhận message, lưu vào DB; khi reconnect, fetch unread messages.
Status service: heart-beat mỗi 5s để track online status, lưu trong Redis với TTL.
Fanout cho group chat: fanout-on-write (push đến tất cả members) vs fanout-on-read (members pull khi cần); với group lớn, fanout-on-write tốn kém → giới hạn group size hoặc hybrid approach.

Nâng Cao

Requirements: multi-channel (push, email, SMS), high volume (hàng trăm triệu notifications/day), reliable delivery, user preferences.

Architecture: Producer Services → Notification Service → Channel Handlers → Third-party providers.

Notification Service: nhận events (order shipped, friend request), lookup user preferences (channel, quiet hours, opt-out), enqueue vào Kafka với separate topics per channel.
Channel Workers: Push Notification Worker gọi FCM (Android)/APNs (iOS); Email Worker gọi SendGrid/SES; SMS Worker gọi Twilio/SNS.
Reliability: at-least-once delivery via Kafka; lưu notification vào DB với status (pending/sent/failed); retry with exponential backoff; dead letter queue cho permanent failures.
Rate limiting per user: không spam user với 100 notifications cùng lúc – aggregate/throttle.
Notification template service: versioned templates với i18n support.
User preference service: per-channel opt-in/out, quiet hours, digest mode.
Monitoring: delivery rate per channel, bounce/unsubscribe tracking, latency P99.

Scale bottleneck thường ở third-party API calls – cần circuit breakers và fallback providers. Idempotency key để tránh duplicate notifications khi retry.

#40Thiết kế Rate Limiter phân tán cho API. Các yêu cầu và giải pháp kỹ thuật? (Design a distributed Rate Limiter for an API. Requirements and technical solutions?)

Nâng Cao

Distributed rate limiter cần shared atomic counter — Redis (với Lua scripts) là giải pháp tiêu chuẩn, với local-global hybrid như là lựa chọn lower-latency.

#41Thiết kế News Feed (như Facebook/Twitter). Fanout strategies và caching? (Design a News Feed like Facebook/Twitter. Fanout strategies and caching?)

Nâng Cao

Requirements: user thấy posts từ người họ follow, realtime updates, pagination, ~500M users.

Core challenge: khi user A post, tất cả followers của A cần thấy post đó trong feed.

Fanout-on-write (Push model): immediately push vào feed cache của tất cả followers → feed read rất nhanh, nhưng write amplification lớn: user có 1M followers → 1M cache writes.
Fanout-on-read (Pull model): khi user load feed, query tất cả người họ follow, merge và sort → không có write overhead, nhưng read rất chậm và expensive.
Hybrid approach (Facebook/Twitter): fanout-on-write cho users thường (< N followers), fanout-on-read cho celebrities (> N followers); merge pre-computed feed + real-time pull từ celebrities.

Feed Storage: Redis sorted set với timestamp là score, post_id là member – ZREVRANGE để paginate; TTL để evict old feeds.
Post storage: separate service, fetch post content từ DB/cache khi render feed.
Ranking: chronological là đơn giản nhất; ML-based ranking (engagement prediction) phức tạp hơn nhưng giữ user lâu hơn.

Cursor-based pagination thay vì offset pagination để tránh missing/duplicate items khi feed thay đổi.

#42Thiết kế hệ thống File Storage như Google Drive hoặc Dropbox. Các thành phần chính? (Design a file storage system like Google Drive or Dropbox. Key components?)

Nâng Cao

Requirements: upload/download/sync files, share với others, version history, ~1B users, ~10 exabytes storage.

Chunking: chia file thành chunks (4-8MB), mỗi chunk được hash (SHA-256) để detect duplicates (deduplication) và support delta sync (chỉ upload chunks thay đổi).
Upload flow: Client chunker → tính hash của mỗi chunk → gửi chunk hashes lên server → server trả lại chunks nào cần upload → client upload missing chunks lên Blob Storage (S3) → server ghi metadata vào DB.
Metadata DB: lưu file tree structure, ownership, permissions, version history – dùng RDBMS (MySQL) cho ACID và complex queries.
Blob Storage: S3-compatible object storage cho raw file chunks.
Deduplication: nếu hai users upload cùng file, chỉ lưu một bản vật lý, tham chiếu từ nhiều users – tiết kiệm storage đáng kể.
Sync service: khi file thay đổi trên device A → upload delta chunks → notify other devices qua WebSocket/long polling → devices download changed chunks.
Conflict resolution: last-write-wins hoặc tạo conflict copy như Dropbox.

#43Thiết kế Search Autocomplete (Typeahead Suggestions). Tối ưu latency và ranking? (Design a Search Autocomplete / Typeahead system. How to optimize latency and ranking?)

Nâng Cao

Requirements: suggest top-K queries khi user gõ, latency < 100ms, millions of users.

Data collection: log tất cả search queries; aggregation job (Hadoop/Spark) chạy weekly/daily để tính frequency của mỗi query (hay trending score).
Trie (Prefix Tree): data structure lý tưởng cho prefix matching – mỗi node đại diện một ký tự, lưu top-K queries tại mỗi node. Query time: traverse trie theo prefix → O(prefix_length).
Trie storage: serialized trie lưu trong distributed cache (Redis).
Caching: top-20% prefixes chiếm ~80% traffic → cache aggressively; prefix cache với TTL 24h; browser cache ở client để giảm requests.
Ranking factors: frequency, freshness (trending queries), personalization (user's search history).
Trie update: không update real-time (expensive); rebuild offline và swap atomically.
Scale: shard trie theo first character (26 shards); consistent hashing để route request đến đúng shard.

Filter: cần filter offensive/spam queries trước khi hiển thị. API: GET /autocomplete?q={prefix}&limit=10 → cần < 50ms để UX tốt.

#44Thiết kế hệ thống Payment như Stripe. Đảm bảo tính chính xác và idempotency? (Design a payment system like Stripe. Ensuring correctness and idempotency?)

Nâng Cao

Requirements: process payments, prevent double charges, handle failures gracefully, PCI compliance, audit trail đầy đủ.

Idempotency (quan trọng nhất): client gửi Idempotency-Key (UUID) theo mỗi request; server lưu {idempotency_key → response} trong DB; nếu cùng key gửi lại (do retry), trả về cached response mà không xử lý lại – ngăn double charge tuyệt đối.
Payment flow: Create PaymentIntent (created) → Client collect card info (Stripe.js không gửi card data đến server – PCI scope reduction) → Confirm payment → Server gọi payment processor → Update trạng thái.
State machine: created → processing → succeeded/failed/refunded – mọi transition được log với Event Sourcing.
Outbox Pattern: ghi payment record và outbox event trong cùng DB transaction; worker đọc outbox và gọi external API, update status khi xong.
Reconciliation: hàng đêm so sánh internal records với statement từ bank/processor để phát hiện discrepancy.
Security: TLS everywhere, no log card data, tokenization (lưu token thay vì raw card number), fraud detection ML model.
Retry strategy: exponential backoff với jitter cho transient failures; không retry idempotent operations mà không có idempotency key.

Compliance: PCI-DSS, SOC2, GDPR data retention policies.

Luyện Phỏng Vấn IT — 2000+ Câu Hỏi Phỏng Vấn IT Có Đáp Án 2026

System Design

Luyện Phỏng Vấn IT — 2000+ Câu Hỏi Phỏng Vấn IT Có Đáp Án 2026

System Design