Anthropic launched Claude Opus 4.8 on May 28, 2026, adding user-facing effort controls, a fast-mode research preview, and dynamic workflows for its Claude Code developer tool. The update, available immediately via the Anthropic API and major cloud platforms, prioritizes developer levers over a brute-force leap in raw model performance.
The release pushes further into the emerging practice of tunable inference, where reasoning depth becomes an explicit product variable. For enterprise teams, that means granular control over token consumption, latency, and cost inside a single model family, widening the set of use cases a single model can serve economically.
Effort control, available on claude.ai, Cowork, and as an API parameter, lets developers choose from five levels—low, medium, high, xhigh, and max—with “high” as the default. Lower effort aims at faster responses and fewer tokens, while higher effort engages deeper reasoning that can consume more resources. The feature stops short of a direct accuracy dial; Anthropic frames it as a trade-off between thoroughness and speed.
Standard API pricing remained unchanged at $5 per million input tokens and $25 per million output tokens. Alongside it, the company introduced a fast-mode research preview priced at $10 per million input tokens and $50 per million output tokens—one-third the cost of the previous Opus fast mode, which ran $30 and $150 respectively. The mode claims up to 2.5× higher output tokens per second, though the metric measures token throughput rather than guaranteed end-to-end task completion.
Fast mode is constrained to the Anthropic API; it cannot be used on Amazon Bedrock, Google Cloud Vertex AI, or Microsoft Foundry. That platform restriction limits its immediate reach for enterprises that have standardized on a particular cloud provider.
The company also released dynamic workflows in Claude Code as a research preview, enabling the tool to plan and execute work across tens to hundreds of parallel subagents. Available on Max, Team, Enterprise, and API/cloud plans—Enterprise off by default—the feature can consume substantially more tokens than a single Claude session, which could raise costs for unsuspecting teams.
Anthropic said the model is about four times less likely than its predecessor to let flaws in code it wrote go unremarked, according to the company’s own evaluations. No independent benchmarks were available at launch to corroborate that claim or to measure Opus 4.8’s performance on coding and agentic tasks relative to other frontier models.
The model supports a context window of 1 million tokens on the Anthropic API, Amazon Bedrock, and Vertex AI, but only 200,000 tokens on Microsoft Foundry. AWS confirmed same-day availability on its Bedrock service and Claude Platform.
For operations and engineering leads, Opus 4.8 sharpens the cost-control toolkit without resetting the pricing floor for inference. The absence of independent latency and accuracy data leaves early adopters with an open question: how much the effort levers actually save in real workloads, and at what quality trade-off.