Hai kỹ thuật nâng cao CoT giúp LLM giải task reasoning phức tạp tốt hơn.
Self-Consistency (Wang 2022)
- Ý tưởng: sample N lời giải CoT khác nhau (temperature cao), lấy đáp án đa số (majority vote). Nếu model thực sự reasoning đúng, các path khác nhau sẽ hội tụ đáp án đúng.
- Workflow:
1. Gửi prompt CoT với temperature=0.7-1.0.
2. Generate N=5-40 lời giải.
3. Extract final answer mỗi cái.
4. Majority vote (với task có finite answer) hoặc weighted by logprob.
- Cải thiện: +10-20% accuracy trên GSM8K, MATH, commonsense reasoning.
- Cost: N × single CoT cost. Dùng N=5 để balance.
- Khi dùng: task có đáp án rõ (math, classification, multiple choice). Không work với open-ended generation.
Tree-of-Thoughts (ToT) (Yao 2023)
- Ý tưởng: khám phá nhiều nhánh suy luận trong "cây", đánh giá từng nhánh, prune nhánh kém, expand nhánh tốt. Như BFS/DFS trên state space.
- Workflow:
1. Decompose task thành steps.
2. Mỗi step, sample K "thought candidate".
3. LLM self-evaluate mỗi candidate (sure/likely/impossible).
4. Keep top candidates → expand next step.
5. Backtrack nếu cần.
- Cải thiện: dramatic trên task cần planning (Game of 24 từ 4% CoT lên 74% ToT).
- Cost: rất đắt, 100x+ single CoT.
- Khi dùng: task cần exploration có chiến lược — game, puzzle, creative writing with constraints.
So sánh:
| Self-Consistency | Tree-of-Thoughts | |
|---|---|---|
| Kỹ thuật | Sample N paths, vote | Tree search với self-eval |
| Implementation | Dễ | Phức tạp |
| Cost | 5-40x | 100-1000x |
| Task phù hợp | Finite answer | Planning, multi-step |
| Code | Few lines | Custom framework |
Khi KHÔNG dùng: task đơn giản (classification, ngắn), latency-sensitive (user chờ), cost-sensitive. Model reasoning mới (o1, o3, Claude extended thinking) đã internalize thinking → prompting đơn giản đủ.
Biến thể khác:
- Graph of Thoughts — generalize ToT với graph thay vì tree, cho phép merge paths.
- Program of Thoughts — sinh code thay vì natural language cho step reasoning.
- Algorithm of Thoughts — guide model theo algorithm cụ thể (BFS, divide-conquer).
Two CoT-enhancing techniques that help LLMs on complex reasoning tasks.
Self-Consistency (Wang 2022)
- Idea: sample N different CoT solutions (high temperature), take the majority vote. If the model is truly reasoning, different paths converge on the right answer.
- Workflow:
1. Send a CoT prompt with temperature=0.7–1.0.
2. Generate N=5–40 solutions.
3. Extract each final answer.
4. Majority vote (for finite-answer tasks) or weight by logprob.
- Improvement: +10–20% accuracy on GSM8K, MATH, commonsense reasoning.
- Cost: N × single-CoT cost. N=5 balances well.
- When to use: tasks with clear answers (math, classification, multiple choice). Doesn't work for open-ended generation.
Tree-of-Thoughts (ToT) (Yao 2023)
- Idea: explore multiple branches of reasoning in a "tree", evaluate each, prune weak branches, expand strong ones. Like BFS/DFS over a state space.
- Workflow:
1. Decompose the task into steps.
2. Each step, sample K "thought candidates".
3. LLM self-evaluates each candidate (sure/likely/impossible).
4. Keep top candidates → expand the next step.
5. Backtrack as needed.
- Improvement: dramatic on planning tasks (Game of 24 from 4% CoT to 74% ToT).
- Cost: very expensive, 100x+ single CoT.
- When to use: tasks requiring strategic exploration — games, puzzles, creative writing with constraints.
Comparison:
| Self-Consistency | Tree-of-Thoughts | |
|---|---|---|
| Technique | Sample N paths, vote | Tree search with self-eval |
| Implementation | Easy | Complex |
| Cost | 5–40x | 100–1000x |
| Good for | Finite-answer | Planning, multi-step |
| Code | Few lines | Custom framework |
When NOT to use: simple tasks (classification, short), latency-sensitive (user waiting), cost-sensitive. Modern reasoning models (o1, o3, Claude extended thinking) already internalize thinking → simple prompting suffices.
Variants:
- Graph of Thoughts — generalize ToT to a graph, allowing path merging.
- Program of Thoughts — emit code instead of natural language for reasoning steps.
- Algorithm of Thoughts — guide the model along specific algorithms (BFS, divide-and-conquer).