Overview
Tree of Thoughts (ToT) is the most sophisticated execution mode in Nadoo AI. Instead of following a single reasoning path, it generates multiple independent reasoning paths in parallel, evaluates each path, prunes unpromising ones, and expands the most promising paths deeper. The best complete path is selected as the final response. This mode is designed for problems where there are multiple valid approaches and the optimal one is not obvious upfront. By exploring several paths simultaneously, ToT finds solutions that linear reasoning might miss.How It Works
Generate Initial Thoughts
The LLM generates
num_thoughts independent reasoning paths at the first level. Each thought represents a different approach to the problem.Evaluate
Each thought is scored using the configured
evaluation_strategy (numeric scoring or pairwise voting).Prune
Thoughts scoring below
pruning_threshold (relative to the max score) are discarded. This focuses computational resources on the most promising paths.Expand
Surviving thoughts are expanded to the next depth level. Each thought generates
num_thoughts child thoughts, continuing the reasoning deeper.Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
agent_mode | string | — | Must be "tree_of_thoughts" |
tot_config.num_thoughts | number | 3 | Number of parallel reasoning paths at each level |
tot_config.depth | number | 2 | Number of levels to explore in the tree |
tot_config.evaluation_strategy | string | "vote" | How to score paths: "score" or "vote" |
tot_config.pruning_threshold | float | 0.3 | Minimum score relative to max to survive pruning |
Configuration Parameters in Depth
num_thoughts
Controls the breadth of exploration at each level. More thoughts means more approaches are considered, but each additional thought adds LLM calls.| Value | Behavior | LLM Calls (approx.) |
|---|---|---|
| 2 | Two alternatives explored | Moderate |
| 3 | Three alternatives (recommended) | Good balance |
| 5 | Five alternatives | High cost, thorough exploration |
depth
Controls how deep the reasoning goes. Each depth level refines and extends the thoughts from the previous level.| Value | Behavior |
|---|---|
| 1 | Generate and evaluate thoughts, select the best (no expansion) |
| 2 | Generate, evaluate, expand the best, evaluate again, select (recommended) |
| 3 | Three levels of exploration (expensive, use for complex problems only) |
evaluation_strategy
- Score
- Vote
Score-Based Evaluation
The LLM assigns a numeric score (0.0-1.0) to each thought based on how promising it is.pruning_threshold
Controls how aggressively underperforming paths are pruned. The threshold is relative to the maximum score at that level.| Threshold | Behavior |
|---|---|
0.0 | No pruning — all thoughts advance to the next level |
0.3 | Moderate pruning — thoughts scoring below 30% of the max are dropped |
0.5 | Aggressive pruning — only the top half survive |
0.8 | Very aggressive — only near-best thoughts survive |
pruning_threshold: 0.3:
If thought scores are [0.85, 0.40, 0.72], the max is 0.85. The threshold cutoff is 0.85 * 0.3 = 0.255. Since all scores exceed 0.255, all thoughts survive. But if one thought scored 0.20, it would be pruned.
SSE Events
Tree of Thoughts mode emits these events:| Event | When | Payload |
|---|---|---|
node_started | Node begins | { node_id } |
agent_thinking | Each thought is generated | { thought_id, depth, content, node_id } |
agent_thinking | Each thought is evaluated | { thought_id, depth, score, node_id } |
agent_thinking | Pruning decision | { pruned_thoughts, surviving_thoughts, depth, node_id } |
llm_token | Each token generated | { token, node_id } |
llm_finished | Best path selected | { node_id, total_tokens, winning_path } |
node_finished | Node completes | { node_id, status } |
agent_thinking event is used throughout the ToT process to communicate the tree exploration to the client. Clients can use these events to visualize the branching thought process.
Example: Strategic Planning
A user asks: “How should we expand our product into the Japanese market?”Level 1: Three Initial Approaches
Evaluation
Level 2: Expand Top Paths
Final Selection
Example: Creative Writing
Computational Cost
Tree of Thoughts is the most expensive mode. The total number of LLM calls grows with bothnum_thoughts and depth:
| Phase | LLM Calls |
|---|---|
| Level 1 generation | num_thoughts |
| Level 1 evaluation (score) | num_thoughts |
| Level 1 evaluation (vote) | num_thoughts * (num_thoughts - 1) / 2 |
| Level 2 generation | surviving_thoughts * num_thoughts |
| Level 2 evaluation | Same as Level 1 for expanded thoughts |
| Final synthesis | 1 |
- Level 1: 3 generations + 3 comparisons = 6 calls
- Level 2 (assuming 2 survive): 6 generations + 15 comparisons = 21 calls
- Final: 1 call
- Total: ~28 LLM calls
Performance Characteristics
| Metric | Tree of Thoughts |
|---|---|
| LLM calls per execution | 10-50+ depending on configuration |
| Latency | Very High (many sequential and parallel LLM calls) |
| Token usage | 5-20x Standard |
| Quality ceiling | Highest (explores multiple approaches) |
When to Use Tree of Thoughts
| Scenario | Recommended? | Why |
|---|---|---|
| Strategic business decisions | Yes | Multiple valid approaches worth exploring |
| Creative content with multiple angles | Yes | Diverse perspectives improve quality |
| Complex problem with no clear path | Yes | Exploration prevents premature commitment |
| Simple Q&A or factual retrieval | No | Massive overkill, use Standard |
| Time-sensitive responses | No | Too slow for real-time interaction |
| Budget-constrained workflows | No | Token cost is very high |
Best Practices
Start with num_thoughts: 3 and depth: 2
Start with num_thoughts: 3 and depth: 2
This is the recommended starting point. It explores 3 approaches with one level of refinement — a good balance of exploration and cost. Increase only after confirming that broader/deeper exploration yields meaningfully better results.
Use vote evaluation for subjective tasks
Use vote evaluation for subjective tasks
Pairwise voting is more robust than numeric scoring for creative, strategic, or subjective tasks where absolute scores are hard to calibrate. Use score-based evaluation for more objective, measurable criteria.
Set higher temperature for initial generation
Set higher temperature for initial generation
Use temperature 0.7-0.9 for ToT to encourage diverse initial thoughts. If all three initial thoughts are similar, the mode loses its advantage over Chain of Thought.
Reserve for high-value outputs
Reserve for high-value outputs
Given the computational cost, use ToT only for decisions or outputs where exploring multiple paths provides clear value: strategy documents, architecture decisions, creative campaigns.
Combine with other modes in a workflow
Combine with other modes in a workflow
Use ToT for the critical decision point, and Standard or Chain of Thought for surrounding nodes. For example: Standard for intake, ToT for strategy generation, Reflection for polishing the output.