System giúp AI review PR tự động, cung cấp feedback trước khi human reviewer, giảm review time và bug reach production.
Requirements:
- 1000 engineer × ~5 PR/week = 5K PR/week = ~700 PR/day.
- Mỗi PR average 200 LOC change, có PR 5000+ LOC.
- Feedback cần < 5 phút (không block developer).
- Support đa ngôn ngữ (TypeScript, Python, Go, Java).
- Integrate GitHub / GitLab.
- Respect code privacy (không leak ra ngoài).
High-level architecture:
┌────────────────────────────────┐
│ GitHub / GitLab Webhook │
│ (on PR opened/updated) │
└──────────────┬─────────────────┘
│
▼
┌────────────────────────────────┐
│ Intake Queue (SQS/BullMQ) │
│ - Dedup, priority │
└──────────────┬─────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Orchestrator │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐│
│ │ Context Builder│→│ Parallel Agents│→│ Report Composer││
│ └────────────────┘ └────────────────┘ └────────────────┘│
└──────────┬──────────────────┬──────────────────┬────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Code Indexer │ │ LLM Gateway │ │ Memory/RAG │
│ (symbol, │ │ (Claude 3.5, │ │ (past PR, │
│ ts-server) │ │ GPT-4o) │ │ patterns) │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└───────────────────┴───────────────────┘
│
▼
┌────────────────────────────────┐
│ Post Review to PR │
│ (inline comments, summary) │
└────────────────────────────────┘Components chi tiết:
1. Webhook handler — lightweight service nhận PR event, enqueue task. Support rate limit, signature verify.
2. Intake queue — decouple webhook và processing. Priority: security-critical repo > core > experimental. Dedup khi PR update nhiều lần liên tiếp (rebase).
3. Context Builder — chuẩn bị đầy đủ context cho LLM:
- PR diff — unified diff với context 3 lines mỗi side.
- PR metadata — title, description, linked issues, author history.
- Touched files full content (cho file nhỏ).
- Symbol dependency — function bị đổi được gọi ở đâu (dùng LSP/tree-sitter).
- Related files — test file của code change, config liên quan.
- Codebase conventions — coding standard, style guide, previous review patterns.
- Past similar PRs — từ memory/RAG.
4. Code indexer — pre-build index của codebase:
- Symbol graph (function/class/import) dùng tree-sitter, ctags, hoặc language server.
- Embeddings của function/file → retrieval similar code.
- Update incrementally theo commit.
- Dùng Sourcegraph, Aider repo map, hoặc tự build.
5. Parallel review agents — mỗi agent specialized:
- Security agent — SQL injection, XSS, secret leak, auth bypass. System prompt chuyên sâu security + OWASP.
- Bug agent — null reference, off-by-one, race condition, resource leak. Focus logic error.
- Performance agent — N+1 query, inefficient algorithm, memory leak.
- Style agent — convention violation, naming, documentation. Rule-based linter trước, LLM cho nuance.
- Test coverage agent — có test cho change mới không, edge case.
- Architecture agent — separation of concerns, SOLID, dependency violation.
- Documentation agent — missing docstring, changelog.
Agents chạy song song → merge output.
6. LLM strategy:
- Small PR (< 200 LOC): 1 pass GPT-4o-mini.
- Medium (200-1000 LOC): Claude 3.5 Sonnet với full context.
- Large (> 1000 LOC): chunk theo file, review từng file, aggregate.
- Critical repo (payment, security): luôn dùng strongest model (o1 hoặc Claude 3.5 Sonnet reasoning mode).
7. Memory/RAG layer:
- Past review patterns — khi human reviewer đã approve/reject issue tương tự → memory.
- Repo-specific conventions — auto-learn từ codebase.
- Common bugs của team (từ incident history, bug tracker).
8. Report composer — format output cho GitHub/GitLab:
- Inline comment trên line cụ thể có issue.
- PR summary tổng hợp top concerns.
- Severity labels — 🔴 must-fix, 🟡 suggestion, 🟢 nit.
- Confidence — "I'm 90% sure this is a bug" vs "Consider whether...".
- Citation — link về similar past PR, docs.
- Auto-suggest fix khi confident (PR suggestion block).
9. Feedback loop — critical cho quality:
- Track feedback: human reviewer 👍/👎 AI comment; author dismiss/apply suggestion.
- Aggregate: false positive rate, useful-to-noise ratio.
- Fine-tune / adjust prompt theo feedback.
- Black-list comment type có false positive cao.
10. Privacy & security:
- Code không ra ngoài: self-host LLM cho sensitive repo (Llama 3.3 70B, Qwen 2.5 Coder).
- Enterprise agreement với provider (ZDR với Anthropic, OpenAI).
- Secret scan trước khi send prompt (remove API key, password pattern).
- Audit log mọi LLM call.
Scale considerations:
- Throughput: 700 PR/day × 6 agent parallel = ~4200 LLM call/day. Với prompt cache 70%: ~1200 non-cached call. Feasible với multi-provider.
- Latency: target p95 < 5 min. Parallelize agent, chunk large PR, pipeline steps.
- Cost: estimate $0.5-3 per PR. 700 PR × $2 = $1400/day. So với cost human review ($50-200/PR human time), ROI rõ.
- Storage: PR context, review history, metrics. Postgres + S3.
Rollout strategy:
1. Shadow mode — AI review, không post, so sánh với human. 2 tuần.
2. Opt-in beta — một số team thử.
3. Default on, easy opt-out — cho phép developer disable nếu không muốn.
4. Gradual trust — ban đầu chỉ suggest; sau khi accuracy proven → auto-request-changes cho critical issue.
Anti-patterns:
- Review tất cả PR với cùng model/depth → waste cost.
- Quá nhiều comment → noise, developer ignore.
- Không có feedback loop → quality không improve.
- Block merge trên AI comment → dev frustrated.
- Ignore codebase context → comment generic.
Benchmarks thực tế:
- GitHub Copilot Pull Request (Copilot Workspace), CodeRabbit, Codium PR-Agent, Sweep AI, Greptile, Vercel Agent đều triển khai pattern tương tự.