Anthropic faces scrutiny over hidden costs in Claude Sonnet 5

Anthropic has released its latest Claude Sonnet 5 model, which maintains the same nominal token pricing as its predecessor while significantly increasing actual operational costs. Independent testing by Artificial Analysis shows that the new model consumes far more tokens to complete complex tasks compared to previous versions. This trend suggests a pattern of hidden price inflation where increased model behavior and tokenizer changes offset the appearance of stable rates for developers.

According to The-decoder, independent evaluations of Anthropic's latest offering, Claude Sonnet 5, reveal a discrepancy between official pricing lists and real-world expenditures. While the company maintains that token costs remain unchanged from previous generations, data suggests that users may face significantly higher bills due to increased model verbosity and agentic behavior.

Hidden cost inflation in token usage

On paper, Anthropic has kept the pricing for Sonnet 5 consistent with its predecessor at $3 per million input tokens and $15 per million output tokens. However, these figures do not reflect the actual resources required to complete a single task. Artificial Analysis found that an average task in their Intelligence Index costs approximately $2.29 using Sonnet 5, which is notably higher than the $1.97 cost associated with the more expensive Opus 4.8 model.

The discrepancy arises because Sonnet 5 exhibits much higher token consumption during complex operations. At maximum performance settings, the model burns through roughly 40 percent more output tokens per task than Sonnet 4.6. This trend is even more pronounced in agent-based knowledge work benchmarks such as AA-Briefcase and GDPval-AA, where the new model runs nearly three times as many agent loops as its previous iteration. Consequently, a task that previously cost about $1.20 on Sonnet 4.6 has nearly doubled in price under the new architecture.

Performance gains versus reasoning limits

Despite the increased costs, Sonnet 5 does show measurable improvements over its predecessor in several key areas. The model achieved a six-point jump over Sonnet 4.6 on the Intelligence Index, tying with high-end configurations of GPT-5.5 for fifth place overall. Specific performance gains include:

A 9-point increase on Terminal-Bench v2.1

A 10-point jump on Humanity's Last Exam

A 7-point improvement on SciCode

However, the model still struggles with high-level reasoning compared to larger frontier models. In the CritPt physics reasoning test from Argonne National Labs and the University of Illinois, Sonnet 5 scored only 17 percent. While this is a 14-point improvement over its predecessor, it remains significantly behind competitors like GLM-5.2 and Claude Opus.

A recurring pattern of pricing opacity

This situation follows a historical trend for Anthropic. When the company launched Opus 4.7, token prices remained flat on paper, but a new tokenizer resulted in approximately 30 percent more tokens being counted for the same text. With Sonnet 5, this issue is compounded by the model's more agentic nature, which naturally requires more iterative processing and higher token counts to reach a conclusion.

As Chinese competitors like Deepseek V4 Pro and GLM-5.2 offer competitive performance at lower costs in the mid-range segment, Anthropic faces growing pressure for transparency. Industry observers suggest that providers should move toward pricing models based on standardized tasks or real-world jobs rather than raw token counts, which can be easily manipulated by internal architecture changes.

FAQ

How does Claude Sonnet 5 compare to its predecessor in terms of cost?

While nominal token prices remain the same, Sonnet 5 costs nearly double for tasks that previously cost $1.20 on Sonnet 4.6. This is due to increased model verbosity and agentic behavior requiring more iterative processing.

What are the specific performance improvements of Claude Sonnet 5?

The model achieved a six-point jump on the Intelligence Index, a nine-point increase on Terminal-Bench v2.1, a ten-point jump on Humanity's Last Exam, and a seven-point improvement on SciCode.

How does Claude Sonnet 5 perform in high-level physics reasoning?

In the CritPt physics reasoning test from Argonne National Labs and the University of Illinois, Sonnet 5 scored only 17 percent. Although this is a 14-point improvement over its predecessor, it remains behind competitors like GLM-5.2 and Claude Opus.

Hidden cost inflation in token usage

Performance gains versus reasoning limits

A recurring pattern of pricing opacity

FAQ

Fresh news on our Telegram