ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

FeaturedNews

Revolutionizing Language Models: Teaching Them to Search with the Stream of Search Approach

Language models (LLMs) have shown impressive progress in natural language understanding and generation. However, they fall short when it comes to complex reasoning and planning tasks. Current LLMs often suffer from error compounding and have difficulty predicting the consequences of actions multiple steps ahead. A major reason for this is that LLMs are typically trained on clean, optimal solutions, with little exposure to the messy, exploratory process of searching for a solution.

Researchers from Stanford University and MIT have introduced a novel approach called Stream of Search (SoS) to teach LLMs to search. SoS represents the search process in natural language, allowing LLMs to learn from suboptimal search trajectories, not just perfect solutions. With SoS, the search algorithm is described in language, including steps like exploring candidate moves, backtracking when a path doesn’t work out, and progressively building towards a solution.

By training on these search traces, LLMs can learn to reason step-by-step, consider alternative paths, and recover from mistakes. Crucially, SoS models internalize the search algorithm, rather than relying on an external solver, and learn a “world model” to simulate the effects of different actions themselves.

The researchers tested SoS on a challenging math puzzle called Countdown and achieved impressive results. SoS models outperformed those trained on optimal solutions by 25%, employing diverse search strategies and heuristics. Moreover, the self-improved SoS models solved 36% of problems that standard search algorithms couldn’t crack.

These findings demonstrate that the core aspects of symbolic reasoning, such as structured search, backtracking, heuristic evaluation, and world modeling, can emerge in neural language models with the right training paradigm. The SoS framework opens up exciting possibilities for imbuing LLMs with more robust reasoning capabilities, including resilience to mistakes, flexibility to break down problems, the ability to learn diverse search strategies, and the potential to discover novel problem-solving heuristics through self-optimization.

Leave a Reply

Your email address will not be published. Required fields are marked *