Skip to main content

Overview

Tree of Thoughts (ToT) is the most sophisticated execution mode in Nadoo AI. Instead of following a single reasoning path, it generates multiple independent reasoning paths in parallel, evaluates each path, prunes unpromising ones, and expands the most promising paths deeper. The best complete path is selected as the final response. This mode is designed for problems where there are multiple valid approaches and the optimal one is not obvious upfront. By exploring several paths simultaneously, ToT finds solutions that linear reasoning might miss.

How It Works

1

Generate Initial Thoughts

The LLM generates num_thoughts independent reasoning paths at the first level. Each thought represents a different approach to the problem.
2

Evaluate

Each thought is scored using the configured evaluation_strategy (numeric scoring or pairwise voting).
3

Prune

Thoughts scoring below pruning_threshold (relative to the max score) are discarded. This focuses computational resources on the most promising paths.
4

Expand

Surviving thoughts are expanded to the next depth level. Each thought generates num_thoughts child thoughts, continuing the reasoning deeper.
5

Select Best

After reaching the configured depth, the highest-scoring complete path is selected, and its result becomes the final response.

Configuration

{
  "type": "ai-agent-node",
  "config": {
    "agent_mode": "tree_of_thoughts",
    "model": "gpt-4o",
    "system_prompt": "You are a strategic planning advisor. Explore multiple approaches thoroughly before recommending the best one.",
    "tot_config": {
      "num_thoughts": 3,
      "depth": 2,
      "evaluation_strategy": "vote",
      "pruning_threshold": 0.3
    },
    "temperature": 0.8,
    "max_tokens": 8192
  }
}
ParameterTypeDefaultDescription
agent_modestringMust be "tree_of_thoughts"
tot_config.num_thoughtsnumber3Number of parallel reasoning paths at each level
tot_config.depthnumber2Number of levels to explore in the tree
tot_config.evaluation_strategystring"vote"How to score paths: "score" or "vote"
tot_config.pruning_thresholdfloat0.3Minimum score relative to max to survive pruning

Configuration Parameters in Depth

num_thoughts

Controls the breadth of exploration at each level. More thoughts means more approaches are considered, but each additional thought adds LLM calls.
ValueBehaviorLLM Calls (approx.)
2Two alternatives exploredModerate
3Three alternatives (recommended)Good balance
5Five alternativesHigh cost, thorough exploration

depth

Controls how deep the reasoning goes. Each depth level refines and extends the thoughts from the previous level.
ValueBehavior
1Generate and evaluate thoughts, select the best (no expansion)
2Generate, evaluate, expand the best, evaluate again, select (recommended)
3Three levels of exploration (expensive, use for complex problems only)

evaluation_strategy

Score-Based Evaluation

The LLM assigns a numeric score (0.0-1.0) to each thought based on how promising it is.
{
  "evaluation_strategy": "score"
}
How it works: Each thought is evaluated independently with a prompt like “Rate this approach on a scale of 0 to 1 based on feasibility, creativity, and completeness.”Pros: Fast (one LLM call per thought), deterministic scoring. Cons: Scores may not be well-calibrated across different thoughts.

pruning_threshold

Controls how aggressively underperforming paths are pruned. The threshold is relative to the maximum score at that level.
ThresholdBehavior
0.0No pruning — all thoughts advance to the next level
0.3Moderate pruning — thoughts scoring below 30% of the max are dropped
0.5Aggressive pruning — only the top half survive
0.8Very aggressive — only near-best thoughts survive
Example with pruning_threshold: 0.3: If thought scores are [0.85, 0.40, 0.72], the max is 0.85. The threshold cutoff is 0.85 * 0.3 = 0.255. Since all scores exceed 0.255, all thoughts survive. But if one thought scored 0.20, it would be pruned.

SSE Events

Tree of Thoughts mode emits these events:
EventWhenPayload
node_startedNode begins{ node_id }
agent_thinkingEach thought is generated{ thought_id, depth, content, node_id }
agent_thinkingEach thought is evaluated{ thought_id, depth, score, node_id }
agent_thinkingPruning decision{ pruned_thoughts, surviving_thoughts, depth, node_id }
llm_tokenEach token generated{ token, node_id }
llm_finishedBest path selected{ node_id, total_tokens, winning_path }
node_finishedNode completes{ node_id, status }
The agent_thinking event is used throughout the ToT process to communicate the tree exploration to the client. Clients can use these events to visualize the branching thought process.

Example: Strategic Planning

A user asks: “How should we expand our product into the Japanese market?”

Level 1: Three Initial Approaches

Thought A: Partnership Strategy
Form strategic partnerships with established Japanese companies
for distribution and localization...

Thought B: Direct Entry Strategy
Establish a local office in Tokyo, hire a local team, and
build direct customer relationships...

Thought C: Digital-First Strategy
Launch online with Japanese localization, use digital marketing
and social media to build brand awareness before physical presence...

Evaluation

Thought A: score 0.82 (strong local knowledge, lower risk, shared control)
Thought B: score 0.65 (full control, high cost, slow start)
Thought C: score 0.78 (fast launch, lower cost, limited market depth)

Level 2: Expand Top Paths

Thought A.1: Partnership with a major tech distributor + localized
product with Japanese-first support...

Thought A.2: Partnership with a consulting firm for enterprise
sales channel...

Thought C.1: Digital launch targeting SMBs with freemium model
+ social media presence...

Final Selection

Best path: Thought A.1 (score 0.91)
-- Partnership with tech distributor provides immediate market access
with localized product and support.

Example: Creative Writing

{
  "agent_mode": "tree_of_thoughts",
  "model": "gpt-4o",
  "system_prompt": "You are a creative writing advisor. Explore multiple narrative approaches before recommending the most compelling one.",
  "tot_config": {
    "num_thoughts": 3,
    "depth": 2,
    "evaluation_strategy": "vote"
  },
  "temperature": 0.9
}
The high temperature encourages diverse initial thoughts, while the vote-based evaluation ensures the most compelling narrative is selected.

Computational Cost

Tree of Thoughts is the most expensive mode. The total number of LLM calls grows with both num_thoughts and depth:
PhaseLLM Calls
Level 1 generationnum_thoughts
Level 1 evaluation (score)num_thoughts
Level 1 evaluation (vote)num_thoughts * (num_thoughts - 1) / 2
Level 2 generationsurviving_thoughts * num_thoughts
Level 2 evaluationSame as Level 1 for expanded thoughts
Final synthesis1
Example with default config (3 thoughts, depth 2, vote evaluation):
  • Level 1: 3 generations + 3 comparisons = 6 calls
  • Level 2 (assuming 2 survive): 6 generations + 15 comparisons = 21 calls
  • Final: 1 call
  • Total: ~28 LLM calls
The cost grows rapidly. A configuration of num_thoughts: 5, depth: 3 with vote evaluation can result in 100+ LLM calls. Use this mode judiciously and only for high-value decisions.

Performance Characteristics

MetricTree of Thoughts
LLM calls per execution10-50+ depending on configuration
LatencyVery High (many sequential and parallel LLM calls)
Token usage5-20x Standard
Quality ceilingHighest (explores multiple approaches)

When to Use Tree of Thoughts

ScenarioRecommended?Why
Strategic business decisionsYesMultiple valid approaches worth exploring
Creative content with multiple anglesYesDiverse perspectives improve quality
Complex problem with no clear pathYesExploration prevents premature commitment
Simple Q&A or factual retrievalNoMassive overkill, use Standard
Time-sensitive responsesNoToo slow for real-time interaction
Budget-constrained workflowsNoToken cost is very high

Best Practices

This is the recommended starting point. It explores 3 approaches with one level of refinement — a good balance of exploration and cost. Increase only after confirming that broader/deeper exploration yields meaningfully better results.
Pairwise voting is more robust than numeric scoring for creative, strategic, or subjective tasks where absolute scores are hard to calibrate. Use score-based evaluation for more objective, measurable criteria.
Use temperature 0.7-0.9 for ToT to encourage diverse initial thoughts. If all three initial thoughts are similar, the mode loses its advantage over Chain of Thought.
Given the computational cost, use ToT only for decisions or outputs where exploring multiple paths provides clear value: strategy documents, architecture decisions, creative campaigns.
Use ToT for the critical decision point, and Standard or Chain of Thought for surrounding nodes. For example: Standard for intake, ToT for strategy generation, Reflection for polishing the output.

Next Steps