Skip to content
Free · 1,000+ readers
Free · Independent
The daily record of artificial intelligence
← Back
AI

DeepSeek makes V4-Pro AI model price cut permanent

The move locks in output pricing at $0.87 per million tokens, roughly 1/29th of Anthropic's comparable rate, intensifying the AI inference price war and setting a new cost baseline for developers.

Sunday, May 24, 2026 · min

DeepSeek said it would lock in a 75% discount on its flagship V4-Pro model as the permanent list price after the current promotion expires at 15:59 UTC on May 31. The decision, disclosed on the Chinese startup’s API pricing page and confirmed to Reuters on May 23, makes the cut the deepest sustained price reduction among frontier AI API providers. It threatens to accelerate an already fierce price war and reshape developer stack decisions.

By turning a six-week promotional tactic into a structural reset, DeepSeek forced competitors to contend with list prices a fraction of Western rates. The new official pricing, applicable to the 1.6-trillion-parameter model with a 1-million-token context window, directly undercuts Anthropic’s comparable output by a factor of nearly 29. For AI developers and startups, the move alters the cost calculations for large-scale inference—though differences in model capability, data residency and censorship restrictions persist.

The V4-Pro pricing page now lists per-million token rates of $0.003625 for cache-hit input, $0.435 for cache-miss input and $0.87 for output, each one-quarter of the original launch levels of $0.0145, $1.74 and $3.48. The Chinese-language page shows equivalent drops to 0.025, 3 and 6 yuan from 0.1, 12 and 24. The temporary 75% discount, first reported by Bloomberg on April 27, was initially due to end on May 5, later extended to May 31 before the company decided to make it permanent.

DeepSeek introduced V4-Pro on April 24, alongside the smaller V4-Flash, as the company’s latest flagship. The model packs 1.6 trillion total parameters with 49 billion active parameters, a 384,000-token maximum output and supports both OpenAI ChatCompletions and Anthropic APIs. A week after launch, the startup slashed cache-hit input prices across all models to one-tenth of launch levels—a cut the V4-Pro’s current published rate does not reflect, suggesting complex internal pricing adjustments.

Short-term discounting among Chinese AI companies has been common, with Alibaba and Baidu previously cutting inference prices temporarily. Converting a deep promotional rate into the official list price, however, marks a more durable escalation that puts immediate pressure on Western closed-source providers. Reuters reported that DeepSeek did not disclose whether increased supply of Huawei Ascend 950 chips enabled the sustained price cut, but analysts have pointed to that as a possible factor.

On a list-price basis, V4-Pro’s $0.87 per million output tokens compares with Anthropic’s $25 for Claude Opus 4.7, making DeepSeek’s model roughly 28.7 times cheaper. OpenAI and Google have yet to respond with their own adjustments, but the gap is wide enough to force pricing reviews at enterprises that build on top of API access. While the price gap does not account for differences in reasoning, latency, safety features or enterprise service guarantees, it injects a new variable into the decision matrix for CIOs and engineering leaders weighing build-versus-buy alternatives for AI workloads.

The pricing page retains a standard clause that DeepSeek may adjust prices in the future, meaning the new rates are the new baseline, not an irrevocable guarantee. Independent benchmark scores for V4-Pro versus Western frontier models remain unpublicized, and factors such as data governance rules under Chinese law, rate limits and availability zones could restrict adoption outside China for many enterprises. No independent verifications of enterprise workload migration have emerged, and DeepSeek has not disclosed customer numbers.

By locking in a dramatic cut as the official price, DeepSeek has reset the cost floor for AI inference and shifted the burden onto Western providers to explain their premium. For developers, the calculus now hinges not only on token costs but on whether the model’s quality and compliance posture can hold up under enterprise scrutiny.

— End —