Self-consistency và Tree-of-Thought: khi nào dùng để cải thiện reasoning?

Hai kỹ thuật nâng cao CoT giúp LLM giải task reasoning phức tạp tốt hơn.

Self-Consistency (Wang 2022)
- Ý tưởng: sample N lời giải CoT khác nhau (temperature cao), lấy đáp án đa số (majority vote). Nếu model thực sự reasoning đúng, các path khác nhau sẽ hội tụ đáp án đúng.
- Workflow:
1. Gửi prompt CoT với temperature=0.7-1.0.
2. Generate N=5-40 lời giải.
3. Extract final answer mỗi cái.
4. Majority vote (với task có finite answer) hoặc weighted by logprob.
- Cải thiện: +10-20% accuracy trên GSM8K, MATH, commonsense reasoning.
- Cost: N × single CoT cost. Dùng N=5 để balance.
- Khi dùng: task có đáp án rõ (math, classification, multiple choice). Không work với open-ended generation.

Tree-of-Thoughts (ToT) (Yao 2023)
- Ý tưởng: khám phá nhiều nhánh suy luận trong "cây", đánh giá từng nhánh, prune nhánh kém, expand nhánh tốt. Như BFS/DFS trên state space.
- Workflow:
1. Decompose task thành steps.
2. Mỗi step, sample K "thought candidate".
3. LLM self-evaluate mỗi candidate (sure/likely/impossible).
4. Keep top candidates → expand next step.
5. Backtrack nếu cần.
- Cải thiện: dramatic trên task cần planning (Game of 24 từ 4% CoT lên 74% ToT).
- Cost: rất đắt, 100x+ single CoT.
- Khi dùng: task cần exploration có chiến lược — game, puzzle, creative writing with constraints.

So sánh:

	Self-Consistency	Tree-of-Thoughts
Kỹ thuật	Sample N paths, vote	Tree search với self-eval
Implementation	Dễ	Phức tạp
Cost	5-40x	100-1000x
Task phù hợp	Finite answer	Planning, multi-step
Code	Few lines	Custom framework

Khi KHÔNG dùng: task đơn giản (classification, ngắn), latency-sensitive (user chờ), cost-sensitive. Model reasoning mới (o1, o3, Claude extended thinking) đã internalize thinking → prompting đơn giản đủ.

Biến thể khác:
- Graph of Thoughts — generalize ToT với graph thay vì tree, cho phép merge paths.
- Program of Thoughts — sinh code thay vì natural language cho step reasoning.
- Algorithm of Thoughts — guide model theo algorithm cụ thể (BFS, divide-conquer).

Two CoT-enhancing techniques that help LLMs on complex reasoning tasks.

Self-Consistency (Wang 2022)
- Idea: sample N different CoT solutions (high temperature), take the majority vote. If the model is truly reasoning, different paths converge on the right answer.
- Workflow:
1. Send a CoT prompt with temperature=0.7–1.0.
2. Generate N=5–40 solutions.
3. Extract each final answer.
4. Majority vote (for finite-answer tasks) or weight by logprob.
- Improvement: +10–20% accuracy on GSM8K, MATH, commonsense reasoning.
- Cost: N × single-CoT cost. N=5 balances well.
- When to use: tasks with clear answers (math, classification, multiple choice). Doesn't work for open-ended generation.

Tree-of-Thoughts (ToT) (Yao 2023)
- Idea: explore multiple branches of reasoning in a "tree", evaluate each, prune weak branches, expand strong ones. Like BFS/DFS over a state space.
- Workflow:
1. Decompose the task into steps.
2. Each step, sample K "thought candidates".
3. LLM self-evaluates each candidate (sure/likely/impossible).
4. Keep top candidates → expand the next step.
5. Backtrack as needed.
- Improvement: dramatic on planning tasks (Game of 24 from 4% CoT to 74% ToT).
- Cost: very expensive, 100x+ single CoT.
- When to use: tasks requiring strategic exploration — games, puzzles, creative writing with constraints.

Comparison:

	Self-Consistency	Tree-of-Thoughts
Technique	Sample N paths, vote	Tree search with self-eval
Implementation	Easy	Complex
Cost	5–40x	100–1000x
Good for	Finite-answer	Planning, multi-step
Code	Few lines	Custom framework

When NOT to use: simple tasks (classification, short), latency-sensitive (user waiting), cost-sensitive. Modern reasoning models (o1, o3, Claude extended thinking) already internalize thinking → simple prompting suffices.

Variants:
- Graph of Thoughts — generalize ToT to a graph, allowing path merging.
- Program of Thoughts — emit code instead of natural language for reasoning steps.
- Algorithm of Thoughts — guide the model along specific algorithms (BFS, divide-and-conquer).

Xem toàn bộ AI Engineering cùng filter theo level & chủ đề con.

Mở danh sách AI Engineering