atman — Intelligence That Lives With You

Two recent papers from leading AI research groups have revealed something remarkable: the way we use language models matters as much as their size. By enabling models to perform latent reasoning—thinking in their internal representation space before producing output—we can achieve significantly higher intelligence per parameter.

The Depth Ceiling Problem

Traditional language models face a fundamental limitation identified by Xu, Jettkant, and Ruis in their paper "The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning". When models are trained to output reasoning directly (like chain-of-thought), they hit a "depth ceiling"—a point where adding more layers doesn't improve reasoning capability.

"tiny transformers trained from scratch discover strategies requiring up to three latent steps, fine-tuned GPT-4o and Qwen3-32B reach five, and GPT-5.4 attains seven under fewshot prompting."
— Xu et al., "The Depth Ceiling" (2026)

The issue is that models must commit to reasoning steps in natural language, which is inefficient and error-prone. Each token generated consumes computational resources and introduces opportunities for mistakes. This creates a bottleneck: deeper reasoning requires more tokens, which means more computation and higher error rates.

The researchers found that "discovery fails abruptly at four steps and beyond" for small transformers, and even scaling to frontier models like GPT-4o only increases the discovered planning depth from three steps to "only four or five, with failure beyond that."

Enter Looped Language Models

The solution comes from Zhu et al.'s groundbreaking work "Scaling Latent Reasoning via Looped Language Models". Instead of outputting reasoning steps as text, looped models perform reasoning in their latent space—the high-dimensional vector representations that encode meaning.

"We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens."
— Zhu et al., "Scaling Latent Reasoning via Looped Language Models" (2025)

Here's how it works: the model processes input, updates its internal state through multiple "loops" or iterations, and only produces output after sufficient reasoning has occurred. This is analogous to a human thinking silently before speaking, rather than thinking aloud.

Key Insight

Latent reasoning decouples computation from output. A model can perform hundreds of reasoning steps internally while producing only a few tokens of output. This dramatically improves efficiency and accuracy.

Intelligence Per Parameter

The implications are profound. By using latent reasoning, smaller models can achieve reasoning capabilities that previously required much larger models. This increases "intelligence per parameter"—a measure of how much reasoning capability you get for each parameter in the model.

"Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks."
— Zhu et al., "Scaling Latent Reasoning via Looped Language Models" (2025)

The research shows that looped models can:

Perform deeper reasoning without hitting the depth ceiling
Achieve better accuracy on complex reasoning tasks
Reduce computational costs by minimizing token generation
Scale reasoning capability more efficiently with model size

The authors demonstrate that "1.4B and 2.6B parameter LoopLMs match 4B and 8B standard transformers on most benchmarks, yielding 2-3× parameter-efficiency gains"—a massive improvement that makes powerful AI accessible on much smaller hardware.

Knowledge Manipulation vs. Knowledge Storage

One of the most fascinating findings from the looped LM research is that the performance gains don't come from storing more knowledge—they come from better manipulation of existing knowledge.

"Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities."
— Zhu et al., "Scaling Latent Reasoning via Looped Language Models" (2025)

This distinction is crucial. Traditional scaling focuses on knowledge storage—memorizing more facts and patterns. Latent reasoning focuses on knowledge manipulation—the ability to combine, transform, and apply knowledge in novel ways. The latter is what enables true reasoning and problem-solving.

Why This Matters for On-Device AI

At Atman, we're particularly excited about these findings because they align perfectly with our mission of bringing powerful AI to edge devices. Latent reasoning makes it possible to run sophisticated AI models on consumer hardware—phones, laptops, and even smaller devices—without sacrificing capability.

By maximizing intelligence per parameter, we can:

Run capable models on devices with limited memory and compute
Reduce energy consumption and battery drain
Maintain privacy by keeping all computation local
Enable offline functionality without cloud dependency

The Future of Efficient Intelligence

The research on latent reasoning represents a paradigm shift in how we think about AI capability. Rather than simply scaling up model size, we're learning to use models more intelligently. This is crucial for making AI accessible, sustainable, and privacy-preserving.

As we continue developing Atman Core and our suite of on-device AI tools, these insights will guide our approach. We believe the future of AI isn't just about bigger models—it's about smarter, more efficient models that can run anywhere.

References

Want to be part of the future of on-device AI? Join our waitlist to get early access to Atman Core.