The era of rapid progress in AI reasoning models may be nearing a slowdown, according to a new analysis by Epoch AI, a nonprofit research institute tracking artificial intelligence developments. The report signals that within a year, the dramatic performance leaps from today’s most advanced reasoning models could begin to taper off—raising questions about the long-term scalability of this technology.
Reasoning models like OpenAI’s o3 have recently delivered impressive gains on AI benchmarks, especially in math and programming. Unlike traditional large language models, these models solve complex problems by simulating deep thinking. They do this by applying reinforcement learning after training—essentially allowing the AI to critique and refine its own responses using feedback mechanisms. This method helps improve accuracy but comes at the cost of speed and computing intensity.
Until recently, most AI labs hadn’t dedicated a massive amount of compute to this reinforcement learning stage. But that’s changing fast. OpenAI has already revealed that it used about 10 times more computing power to train o3 than its predecessor, o1. Analysts believe much of that increase was funneled into the reasoning stage rather than the initial training.
According to OpenAI researcher Dan Roberts, future models will likely lean even more heavily on this reinforcement learning approach. But Epoch warns there’s a ceiling: there’s only so much computing you can throw at these systems before hitting a limit—both technically and financially.
Scaling Limits Loom for Reasoning Models
Epoch’s analyst Josh You explains that while conventional model performance has been doubling every year, improvements from reinforcement learning have surged 10x every 3–5 months. However, this explosive pace won’t last forever. By 2026, the performance curve of reasoning models is expected to converge with the broader frontier of AI model progress, diminishing the gap that once made them exceptional.
Beyond computing limitations, other hurdles could stall momentum. The cost of research—especially at the cutting edge—isn’t trivial. These reasoning systems are not only compute-heavy but also resource-intensive to test, validate, and fine-tune. That overhead, Epoch argues, may eventually become a bottleneck of its own.
“There’s a real chance that the scaling of reasoning models will hit a wall,” You writes in the report. “If compute efficiency plateaus or research overhead stays high, progress could slow dramatically.”
This has broader implications for the AI industry. Companies like OpenAI, Anthropic, and Cohere have funneled enormous resources into developing reasoning models in hopes of creating more useful, reliable, and adaptable AI systems. Yet even with soaring compute budgets and talent investments, reasoning models still face persistent flaws—such as generating hallucinated or false answers more frequently than traditional models.
As the race to build smarter AI continues, the possibility that reasoning AI may soon hit a ceiling could reshape investment strategies and research priorities. For now, though, the industry continues to bet big on pushing these models as far as current tools and infrastructure will allow.