According to The-decoder, independent evaluations of Anthropic's latest offering, Claude Sonnet 5, reveal a discrepancy between official pricing lists and real-world expenditures. While the company maintains that token costs remain unchanged from previous generations, data suggests that users may face significantly higher bills due to increased model verbosity and agentic behavior.
Hidden cost inflation in token usage
On paper, Anthropic has kept the pricing for Sonnet 5 consistent with its predecessor at $3 per million input tokens and $15 per million output tokens. However, these figures do not reflect the actual resources required to complete a single task. Artificial Analysis found that an average task in their Intelligence Index costs approximately $2.29 using Sonnet 5, which is notably higher than the $1.97 cost associated with the more expensive Opus 4.8 model.
The discrepancy arises because Sonnet 5 exhibits much higher token consumption during complex operations. At maximum performance settings, the model burns through roughly 40 percent more output tokens per task than Sonnet 4.6. This trend is even more pronounced in agent-based knowledge work benchmarks such as AA-Briefcase and GDPval-AA, where the new model runs nearly three times as many agent loops as its previous iteration. Consequently, a task that previously cost about $1.20 on Sonnet 4.6 has nearly doubled in price under the new architecture.
Performance gains versus reasoning limits
Despite the increased costs, Sonnet 5 does show measurable improvements over its predecessor in several key areas. The model achieved a six-point jump over Sonnet 4.6 on the Intelligence Index, tying with high-end configurations of GPT-5.5 for fifth place overall. Specific performance gains include:
However, the model still struggles with high-level reasoning compared to larger frontier models. In the CritPt physics reasoning test from Argonne National Labs and the University of Illinois, Sonnet 5 scored only 17 percent. While this is a 14-point improvement over its predecessor, it remains significantly behind competitors like GLM-5.2 and Claude Opus.
A recurring pattern of pricing opacity
This situation follows a historical trend for Anthropic. When the company launched Opus 4.7, token prices remained flat on paper, but a new tokenizer resulted in approximately 30 percent more tokens being counted for the same text. With Sonnet 5, this issue is compounded by the model's more agentic nature, which naturally requires more iterative processing and higher token counts to reach a conclusion.
As Chinese competitors like Deepseek V4 Pro and GLM-5.2 offer competitive performance at lower costs in the mid-range segment, Anthropic faces growing pressure for transparency. Industry observers suggest that providers should move toward pricing models based on standardized tasks or real-world jobs rather than raw token counts, which can be easily manipulated by internal architecture changes.