According to Yellow, the latest head-to-head evaluations between OpenAI and Anthropic highlight distinct strengths in their newest frontier models. The comparison centers on GPT-5.6 Sol, the flagship of OpenAI's recent three-tier release, and Claude Fable 5, which recently returned to global availability following a brief regulatory hiatus.
Benchmark performance and terminal capabilities
OpenAI reports that GPT-5.6 Sol achieved an 88.8% score on the Terminal-Bench 2.1, a metric designed to test command-line coding agents that must plan, iterate, and coordinate various tools. When utilizing the compute-heavy Ultra mode—which deploys coordinated subagents to handle complex tasks—the performance figure rises to 91.9%. This represents the highest published mark on the Terminal-Bench chart to date.
In contrast, reviewers noted that Claude Fable 5 trails Sol slightly in terminal-specific tests, with scores ranging between 83.4% and 84.3%. However, Sol's performance is also being scrutinized for its efficiency; it reportedly matches Mythos-class performance on the ExploitBench security suite while using approximately one third of the output tokens. This cost compression is considered a vital factor for long-running autonomous agent workflows.
Software engineering and accessibility gaps
Despite Sol's terminal dominance, Claude Fable 5 remains the leader in SWE-Bench Pro, which measures the ability to provide end-to-end fixes for real GitHub issues. The model scored 80.3% on this benchmark, significantly outperforming the older GPT-5.5 model, which sat at 58.6%. Because OpenAI has not yet published a GPT-5.6 figure for SWE-Bench Pro, analysts suggest that closing such a wide performance gap may require more than an incremental update.
The choice between the two models currently depends on specific use cases and access levels:
- GPT-5.6 Sol is optimized for terminal-driven agents and offers lower pricing at $5 per million input tokens.
- Claude Fable 5 is preferred for repository-level fixes and remains globally available as of July 1.
- Sol is currently restricted to a limited preview for roughly 20 government-cleared partners due to security considerations.
The competitive landscape was further complicated in June when regulatory concerns forced Anthropic's models offline briefly following a reported jailbreak by Amazon researchers. While Mythos 5 was restricted to vetted organizations, Fable 5 has been restored for the general public. These developments suggest that while both companies are pushing the limits of coding automation, the path to full deployment remains tied to rigorous security vetting.