A revolutionary 27M-parameter AI model that performs complex sequential reasoning in a single forward pass. Featuring dual recurrent modules for high level planning and sharp detail, it outperforms larger models on puzzles and maze challenges.
Based on a quick first skim of the abstract and the introduction, the results from hierarchical reasoning (HRM) models look incredible:
> Using only 1,000 input-output examples, without pre-training or CoT supervision, HRM learns to solve problems that are intractable for even the most advanced LLMs. For example, it achieves near-perfect accuracy in complex Sudoku puzzles (Sudoku-Extreme Full) and optimal pathfinding in 30x30 mazes, where state-of-the-art CoT methods completely fail (0% accuracy). In the Abstraction and Reasoning Corpus (ARC) AGI Challenge 27,28,29 - a benchmark of inductive reasoning - HRM, trained from scratch with only the official dataset (~1000 examples), with only 27M parameters and a 30x30 grid context (900 tokens), achieves a performance of 40.3%, which substantially surpasses leading CoT-based models like o3-mini-high (34.5%) and Claude 3.7 8K context (21.2%), despite their considerably larger parameter sizes and context lengths, as shown in Figure 1.
Excited to learn more about this product (& opportunity).
Do you use complex evaluation methods during training?
For example, how do you check the model's performance as task gets more complex?
What key metrics do you use to improve its reasoning?
No way, brain-inspired reasoning in AI? That's wild! My project planning always gets messy—if this model can actually handle multi-level logic, I’m totally in. How does it scale with complexity?
From the discussion on Hacker News: