DeepSeek said on its official pricing page Tuesday that it will keep API access to its flagship V4-Pro model at a quarter of the original cost after a 75% launch discount expires on May 31, extending a string of aggressive price cuts by the Hangzhou-based startup. The update, first reported by Reuters on May 23, locks in a rate structure that began as a short-term developer promotion and signals that the company intends to compete primarily on cost, further widening the gap with Western providers that charge far more for comparable large-language models.
The move turns V4-Pro into one of the lowest-cost high-context, large-scale models available via API, sharply raising pressure on both proprietary Western labs and Chinese rivals already locked in a price war. Under the new permanent pricing, cache-hit input will cost $0.003625 per million tokens, cache-miss input $0.435 and output $0.87—exactly one quarter of the originally posted rates of $0.0145, $1.74 and $3.48. In yuan terms, the range drops to 0.025–6 yuan per million tokens, according to Reuters.
Launched in preview on April 24, V4-Pro carries 1.6 trillion total parameters—49 billion of them active during inference—and supports context windows of up to 1 million tokens. DeepSeek described the model as competitive with top closed-source alternatives, though independent benchmarks have not yet been widely published. The new rate card places V4-Pro’s full output cost below what many enterprises pay for much smaller or less capable models from major Western providers. At most competing labs, a model with similar specifications would be priced at multiples of these new rates.
The permanent pricing follows a rapid series of cuts. An initial 75% promotion for developers, launched shortly after the April preview, was originally set to end on May 5, then extended to May 31. The pricing page now shows the post-promotion price will simply remain at the discounted level, with struck-through original figures still visible. Separately, on April 26, DeepSeek slashed its cache-hit input prices across all models to one-tenth of launch levels—a move that made repeated queries dramatically cheaper for high-volume applications. The across-the-board cache-hit reduction, together with V4-Pro’s permanent discount, makes DeepSeek’s suite especially attractive for deployments that reuse frequent prompt patterns.
The price lock-in intensifies a cycle of discounting that outlets including Bloomberg have termed a Chinese AI price war, as domestic rivals such as Alibaba, Moonshot and MiniMax have also cut API rates aggressively. DeepSeek’s latest step challenges the premium pricing sustained by OpenAI, Anthropic and Google for frontier-tier offerings, particularly for cost-sensitive developer workloads. The permanent mark-down raises questions about whether such premium pricing can hold if high-performance alternatives persist at much lower rates. No competitor has publicly responded to the move, however, and any direct margin impact remains unverified.
DeepSeek’s pricing page notes that the company reserves the right to adjust prices at any time, meaning the new rates represent current policy rather than an irrevocable guarantee. Token pricing alone, moreover, captures only one dimension of total cost: caching behaviour, latency, concurrency limits and reliability all shape enterprise economics. Reuters reported that increased availability of Huawei’s Ascend 950 chips could support DeepSeek’s ability to sustain such pricing, but the company has not confirmed that connection.
For developers and enterprise procurement teams evaluating API budgets, this permanent discount sharply reshapes the arithmetic of model selection. Even without immediate countermoves from rivals, the gap between what the cheapest high-context models cost and what premium labs charge is already widening—putting a premium on performance data that can justify the differential.