The following text is an excerpt from Ashutosh Shrivastava’s post:
Sapient released their Hierarchical Reasoning Model (HRM) and the results are pretty interesting. This is a 27M parameter model that outperforms Claude 3.5 and o3-mini on reasoning benchmarks like ARC-AGI-2, complex Sudoku puzzles, and pathfinding in large mazes.
What makes this notable:
The efficiency aspect is striking. HRM was trained on roughly 1000 examples with no pretraining or Chain-of-Thought prompting, yet it handles complex reasoning tasks that typically require much larger models. This makes it practical for deployment on edge devices and accessible for teams without massive compute budgets.
The brain-inspired architecture is more than just terminology. HRM uses a dual-system design with two modules: one for high-level abstract planning and another for rapid detailed execution, operating at different time scales. This mirrors how human cognition works with both fast intuitive processing and slower deliberate reasoning.
The low-resource requirement changes the accessibility equation. While most advanced AI requires significant infrastructure, HRM can run on regular hardware, opening up sophisticated reasoning capabilities to startups and researchers who can’t afford large-scale compute.
You can read the paper here: [2506.21734] Hierarchical Reasoning Model
This is so interesting. Have you figured out how to do something with it? If this model scales it will change the world. 29 million parameter model that trains in a 1000 examples.
Reminds me of some older work by Numenta: Hierarchical temporal memory - Wikipedia
Welcome to the Forum, @vamsi and thanks for sharing this! HRM is certainly an intriguing architecture.
From the Thousand Brains perspective, we agree that hierarchical organization and temporal abstraction are important for modeling sequences. And while HRM is closer to our notion of hierarchy than, say, a stack of CNN layers or transformer blocks, there are still fundamental differences.
In TBT, the different levels of the hierarchy represent compositional objects, where higher-level objects are composed of reusable lower-level ones (e.g. logo on cup). This is different from a recurrent deep learning architecture operating at different fixed timescales.
The following is from our “Hierarchy or Heterarchy” paper:
Columns in each region learn structured models, up to, and including, complete objects. We propose that the role of the hierarchical connections between columns is to learn compositional models, that is objects that are composed of other objects. Most of the world is structured this way. For example, a bicycle is composed of a set of other objects, such as wheels, frame, pedals, and seat, arranged relative to each other. Each of these objects, such as a wheel, is itself composed of other objects such as tire, rim, valve, and spokes. In another example, words are composed of syllables, which are themselves composed of letters. And finally, an example that we often use in our research, a coffee mug may have a logo printed on its side. The logo is an object that was previously learned, but in this example the logo is a component of the coffee mug. Learning compositional objects can occur rapidly, with just a few visual fixations. This tells us that the neocortex does not have to relearn the component objects; the neocortex only has to form links between two existing models.
Beyond the hierarchical processing, one key concept that’s still missing in many architectures, including HRM, is that of reference frames. In TBT, reference frames are at the core of intelligence and are used within every cortical column. They allow for structured learning of environments and the objects within. An agent uses these reference frames to represent features in space, learn spatial relations of features and objects to each other, plan and apply movements, and make predictions. Without these reference frames, models often rely on statistical correlation rather than structured models of the world to complete their tasks.
We also take a different stance on deep learning itself. Monty doesn’t use deep learning at all, not only for performance reasons, but because we believe it’s not how the brain works. The learning mechanisms are fundamentally different. We discuss this more here.
Here are more thoughts on this: