Unfolding Tree of Thought

Sam Naji, Joseph Tekriti
June 13, 2023
15 minute read
Table of Contents

Current prompting techniques allows the language model to convert a simple prompt, break it down into steps and go through them one at a time however there is no exploration involved for the Large Language Model, it just performs its instructions in the set of order as instructed, In tree of Thought Prompting we allow the model to generate different thoughts just like in chain of thought and then before moving on the next thought we allow it to evaluate if the current thought is a good thought to generate the optimal output and using ‘backtracking’ the model can always refer back to previous thought process, using all of this together the model finds the best possible solution.
Tree of Thought prompting technique provides a unique approach to enhance language models and their ability to solve complex problems intelligently. It offers a novel way to expand upon existing planning techniques by considering multiple potential solutions simultaneously at each step of the problem-solving process. This approach allows language models to explore different paths and select the most promising ones based on their own evaluations. The integration of thought sampling and value feedback within the Tree of Thought framework facilitates effective search capabilities within the solution space. Unlike traditional decision-making methods that require separate models for rewards and policies, the Tree of Thought leverages the language model itself to provide value estimates, making the decision-making process more seamless and efficient. Self-reflection is another integral aspect, language models equipped with the Tree of Thought can assess the viability of their own predictions, leading to improved accuracy in various tasks such as code generation and decision-making. This introspective ability allows Language Models to provide valuable feedback on their own performance, enhancing their problem-solving capabilities. the Tree of Thought framework aligns with program-guided LM generation. By incorporating symbolic program instructions, Language Models can be guided step-by-step to solve complex problems. This approach expands traditional search methods by utilizing the models own thoughts rather than relying solely on external sources. This integration fosters a deeper understanding and more intelligent problem-solving

Input/Output vs Chain of Thought vs Self Consistency with Chain of Thought

  • Input/Ouput- It is the most basic form of Prompt that people generally use, where the user provides a question, problem or a task to the Language Model and the model performs the required analysis, creativity or problem solving. It may include how to format the response and other such simple inputs It provides the initial context or problem statement for which we seek solutions or insights. An example prompt would be "Write a story about Heaven on Earth."
  • Chain of thought: It refers to a sequence of interconnected thoughts, steps or ideas that stem from the initial prompt or problem statement. It represents the progression of thoughts that are generated and linked together after exploring different possibilities, considerations, and solutions. Each thought in the chain builds upon the previous one, creating a coherent flow of ideas. Using the previous example a Chain of thought prompt would be: “Write a story about ‘Heaven on Earth where the main character accidentally finds a gate within the forest. The gate leads the character to another dimension where it saw mythical creatures like talking dragons.’ The story can have adventurous plot where the talking dragon is befriended by the main character. “ This is one of the initial prompt in Chain of thought prompting and it can lead to further elaboration and exploration, generating subsequent chain of thoughts and each thought serves as a building block for the next which eventually generates the response that user needed.
  • Chain of Thought- Self Consistency: Self Consistency with chain of thought refers to generating multiple sequence of chain of thoughts, and then based on the most consistent output it choses its optimal solution. One problem can be solved in many different ways and even if the thought process to the solution differs the final result might be same (like proving a physics equation through different experiments) or it might even differ in context to problems involving creativity and imagination but there always can be found some kind of similarity between the outputs, the self consistency chain of thought prompting technique generates different thoughts (steps) for a problem and all those steps lead to some solution, and the output that is most consistently generated by the model becomes the optimal output. In reference to our example the most commonly generated story by the language model will become the optimal output. With this technique the output decision can be more faithful by exploring a richer set of thoughts. Nevertheless, in each sequence, there is no exploration of different thought processes at a local level, and the heuristic of choosing the "most frequent" option only works when the range of possible outputs is restricted.

To summarize we can say that the Input/Output Prompt sets the stage for generating thoughts, the Chain of Thought represents the sequence of interconnected ideas, and the Self Consistency with chain of thought ensures logical coherence and alignment within the chain.

How Tree of Thought Works?

The Tree of Thought prompting provides an innovative approach to problem-solving by leveraging the capabilities of language models. In simple terms, it works by allowing the Language Models to consider multiple potential solutions simultaneously at each step of the problem-solving process. This enables the model to explore different paths and select the most promising ones based on their own evaluations.

Different prompting techniques

By now you already know the difference between Input-Output Prompting, Chain of thought prompting and Self Consistency with Chain of Thought. In simple explanation, the tree of thought prompting can be illustrated as above to be a tree-like structure where initially potential thoughts are generated using the input and each thought is evaluated upon by the Language model itself. Based on its self-evaluation and search algorithm being used it progresses to the next relevant thought, if the thought is evaluated to ‘unlikely generate’ the optimal output, it is discarded by the model but if it is evaluated to ‘likely generate’ the optimal output, the model takes that thought and breaks it down into further thoughts and evaluates them again and repeats this process until it finds the most optimal solution. Below is a detailed step by step explanation with an example of how Tree of Thought works.

  1. Thought Decomposition- the problem at hand is broken down into smaller, more manageable sub-problems. This decomposition allows the Language Models to focus on solving each sub-problem independently, making the overall problem-solving process more efficient. By decomposing the problem into manageable pieces, the LM can generate a tree-like structure that represents the different paths it can explore.
  2. Thought Generator: Once the problem is decomposed, the Language Models utilizes its language generation capabilities to generate potential solutions or thoughts for each sub-problem. The thought generator aspect of the Tree of Thought framework leverages the LM's vast knowledge and training on a wide range of data to propose possible solutions or actions for each sub-problem. The LM generates multiple thoughts or options for each sub-problem, which helps to explore different paths towards the solution.
  3. State Evaluator: After generating thoughts for each sub-problem, the next step involves evaluating the quality or feasibility of these thoughts. The state evaluator component assesses the generated thoughts based on the problem setting and the current state of the problem-solving process. The evaluator takes into account factors like coherence, relevance, and effectiveness to determine the potential value of each thought. By evaluating the thoughts, the Language Models can identify the most promising options to proceed further.
  4. Search Algorithm: The final step of the Tree of Thought framework involves employing a search algorithm to explore and navigate through the generated thoughts. The search algorithm helps the Language Models to traverse the tree-like structure created earlier, considering multiple paths and potential solutions. By following the search algorithm, the LM can effectively navigate through the solution space, identifying the most promising thoughts and discarding less favorable ones. Here we will be talking about the 2 simple Tree Search Algorithm that can be used for ‘Tree of Thought’.  
  • Breadth First Search Algorithm-The Breadth First Search (BFS) algorithm is a graph traversal technique that explores all the vertices of a graph or a tree in a breadth-wise manner. So it first traverses through the first layer of the tree then moves on to the next layer.
  • Depth First Search Algorithm- The Depth First Search (DFS) algorithm is another graph traversal technique that explores all the vertices of a graph or a tree in a depth-wise manner. we start exploring from the root node and continue down each branch as far as possible before backtracking. This means that we explore thoughts in a depth-first manner before moving on to other branches.

To understand how the Tree of Thought works, let's consider the example of “GAME OF 24”. Game of 24 is a mathematical reasoning challenge, where the goal is to use 4 numbers and basic arithmetic operations (+-*/) to obtain 24. We will be using 4,9,10,13 numbers for our example:

  1. In the thought decomposition stage, we break down the problem of finding an expression that evaluates to 24 into smaller sub-problems. One possible sub-problem could be combining the numbers 4, 9, 10, and 13 using addition, subtraction and multiplication operations so we are excluding the division operation. By decomposing the problem, the Tree of Thought creates branches, each representing a different combination of operations and numbers that can be used to reach the goal of 24.  
  2. Next, the thought generator aspect of the Tree of Thought generates various thoughts or potential expressions for each sub-problem. In our example, let's consider one good sample thought: (10-4) * (13 - 9). This thought represents a potential expression that combines the four numbers using addition, subtraction and multiplication operations that can generate the number 24. On the other hand, a bad sample thought could be: 13 + 4 + 10 – 9 which does not achieve the desired result. The thoughts generator explores different combinations of operations and numbers to generate a set of potential expressions.
  3. The state evaluator examines the validity of the thoughts generated. It evaluates whether each thought's expression results in the target value of 24. Lets consider our good sample thought (10-4)*(13-9) even though we understand that this sample thought is good because it generates the output of 24, the Language model evaluates this thought here in this step and then it determines whether the thought is good or bad. Conversely, the bad sample thought 4 + 9 + 10 - 13 will be evaluated and determined as not achieving the target value. so until now the language model did not know which thought is good and which is bad. The state evaluator considers the correctness of the mathematical operations and the accuracy of the final result to evaluate each thought.
  4. The search algorithm guides the exploration of thoughts to find the optimal solution. It traverses through the tree-like structure created by the thoughts generator, considering the validity and potential of each thought, and selecting the most promising paths that lead to a valid expression evaluating to 24. By following the search algorithm, the Tree of Thought effectively explores different paths, discarding less favorable thoughts and identifying the expressions that yield the desired target value.
  • Breadth First Search Algorithm-If we use BFS for our example of 4,9,10,13 it would explore thoughts at each level before moving to the next level so it examine thoughts like (4+9), (4-9), (4*9), (4/9), and then move to the next level to explore thoughts that combine the following number. This process continues until a valid expression that evaluates to 24 is found or all possible thoughts have been evaluated.
  • Depth First Search Algorithm- While in DFS it would explore the deepest level of our initial thought, we might start with a thought like (4*9). Then, we would explore its child thoughts, such as ((4*9)-13), before backtracking and exploring other branches, such as (10+13). This process continues, going deeper into each branch until a valid expression is found or all thoughts have been evaluated.

Through ‘Thought Decomposition’, ‘Thoughts Generation’, ‘State Evaluation’, and ‘Search Algorithm’, the framework enables the exploration and identification of suitable expressions. It breaks down the problem into sub-problems, generates potential thoughts, evaluates their validity, and searches for the optimal solution.

Why Tree of Thought?

Tree of thought prompting technique is one huge step towards the future of Large Language Models, so why should Prompt Engineers use this technique?:

  • Planning and decision making- The process of organizing and strategizing thoughts to achieve specific goals. It involves breaking down complex problems into smaller thoughts and evaluating potential solutions based on predefined objectives.
  • Self-reflection- the ability of Language Models to assess the viability of their own predictions. Language Models provide feedback on their generated candidates, allowing for critical evaluation and refinement
  • Program-guided LLM generation- incorporates symbolic program guidance to shape the behavior of Language Models. It involves using external paragraphs or self-generated thoughts to expand the search space and improve problem-solving capabilities.
  • Classical search methods- applying traditional search algorithms, such as BFS, to navigate the thought space and find optimal solutions based on the Language Model’s self-assessment.

The following graphics shows how the success rate of the Large Language Models differ based on the prompting technique used to solve one same problem of ‘Game of 24’. As can be seen from the graph that Input Output, Chain of Thought, and Chain of Thought -Self Consistency prompting methods had low success rates on the task, achieving only 7.3%, 4.0%, and 9.0% respectively. In contrast, the Tree of Thought method with a breadth of b = 1 achieved a success rate of 45%, while b = 5 achieved 74% (‘b’ is the branching factor of the search, it represents the maximum number of successors or children that a node can have in the search tree).

This technique is still in its early stages of development and therefore has some limitations. It may not be necessary for tasks that LMs already excel at, as deliberate search methods like Tree of Thought are primarily designed for complex problem-solving scenarios. The resources required for Tree of Thought, such as the GPT-4 API cost, may be higher compared to sampling methods. However, ongoing open-source efforts aim to reduce these costs in the future. Additionally, the use of off-the-shelf LMs limits the problem-solving capabilities, and fine-tuning LMs using high-level counterfactual decision-making approaches could enhance their performance. It is essential to explore more real-world applications and study the research questions associated with deploying LMs in decision-making contexts.

As Large Language Models(LMs) continue to be deployed in various real-world decision-making applications, more complex tasks may emerge, prompting the exploration of better search and planning abilities. The modular flexibility of Tree of Thought(ToT) allows for customization of performance-cost tradeoffs, and ongoing open-source efforts are expected to reduce resource requirements. Fine-tuning LMs using a ToT-style high-level counterfactual decision-making approach could enhance their problem-solving capabilities. However, it is crucial to consider the broader impact of ToT, ensuring ethical and responsible use to prevent potential harm. Furthermore, the combination of LMs with classical AI approaches offers exciting avenues for future research, expanding the capabilities of both systems.

Acknowledgment: This guide was skillfully crafted with the help of Saud M.

Join Our Newsletter

Stay informed with the latest in AI research, updates, and insights directly to your inbox

Subscribe Now

More our similar blogs

You might also like

November 28, 2023

Using Gen AI to reduce reliance on human labers


Sam Naji, Joseph Tekriti
November 25, 2023

Is That Picture Real?


Sam Naji, Joseph Tekriti
November 24, 2023

Advanced Prompting Frameworks


Sam Naji, Joseph Tekriti