When OpenAI launched its o3 “reasoning” AI model in December, it partnered with ARC-AGI, a benchmark for testing advanced AI. The aim was to showcase the model’s capabilities. However, after months of testing, the results are now updated. They reveal that o3 might cost more than initially thought.
Higher Costs Than Expected
The Arc Prize Foundation, responsible for maintaining ARC-AGI, recently revised the estimated computing costs for o3. Initially, the most efficient version, o3 high, was thought to cost around $3,000 to solve a single ARC-AGI problem. Now, the cost estimate has jumped to $30,000 per task.
This revision highlights just how expensive advanced AI models can be, especially in their early stages. OpenAI has not yet released o3 or set its official price. However, the Arc Prize Foundation believes that the pricing for OpenAI’s o1-pro model might be a good reference for o3’s cost.
Context on o1-Pro Model
For context, o1-pro is currently OpenAI’s most expensive model. Mike Knoop, co-founder of the Arc Prize Foundation, explained, “We believe o1-pro is a closer comparison of true o3 cost, given the test-time compute used.” However, he noted that this is still an estimate, and o3 is labeled as “preview” until official pricing is announced.
The Computing Power Behind o3
A high price for o3 wouldn’t be surprising, considering its computing demands. The Arc Prize Foundation reports that o3 high uses 172 times more computing power than o3 low. This significant increase in computing requirements is a key reason behind the higher cost.
Rumors of High-Cost Enterprise AI Agents
Rumors about OpenAI’s plans for expensive AI solutions for enterprise customers have been circulating for months. In early March, The Information reported that OpenAI may charge up to $20,000 per month for specialized AI agents. These agents would perform tasks like software development.
Cost vs. Human Contractors
Some may argue that even OpenAI’s most expensive models will be cheaper than hiring human contractors. However, AI researcher Toby Ord raised concerns about the efficiency of these models. On X, he pointed out that o3 high required 1,024 attempts at each task in ARC-AGI to achieve its best score. This suggests that while AI may be cheaper, it might not always be as efficient.