DeepSeek’s official pricing page states that the 75% discount on its V4-Pro model API will become the new list price after the promotion expires on 31 May, locking in a quarter-of-original-cost benchmark for frontier inference. Reuters first reported the move on 23 May, citing a company statement, and noted that the Chinese AI lab’s most advanced model will remain at its current promotional rates—subject to future adjustments.
The permanent reduction extends a wave of aggressive pricing in China’s AI market and places a new floor under inference costs for a model that DeepSeek claims rivals top closed-source alternatives. For developers using the lab’s own API, the shift lowers input and output costs significantly and increases pressure on Western and Chinese competitors to respond.
Current listed V4-Pro pricing, which will become the standard on 1 June, sets cache-hit input at $0.003625 per million tokens, cache-miss input at $0.435 and output at $0.87—each exactly one quarter of the launch price. In yuan terms, the range is 0.025 to 6 yuan per million tokens, down from 0.1 to 24 yuan, Reuters reported. A separate cut for cache-hit inputs across all models, to one tenth of launch pricing, took effect on 26 April.
DeepSeek launched V4-Pro on 24 April as the higher-end option in its V4 preview release, a 1.6-trillion-parameter model with 49 billion active parameters and a 1-million-token context window. Independent benchmarking firm Artificial Analysis later ranked it top for cost efficiency after the 75% discount, the South China Morning Post reported on 24 May.
The pricing page carries a disclaimer that product prices may vary and that DeepSeek reserves the right to adjust them, so the new rates represent the ongoing official price rather than an irrevocable guarantee. The discount was initially introduced as a temporary promotion that was set to expire on 5 May, then extended to 31 May before the company decided to keep the lower levels.
Reuters noted that DeepSeek did not disclose whether the cut was linked to increased supply of Huawei’s Ascend 950 chips, a potential enabler for lower-cost inference. The company offered no rationale for sustaining the lower price.
Open questions remain. DeepSeek’s rate limit page sets a concurrency cap of 500 for V4-Pro, which could constrain access under heavy workloads. No independent data exists on latency, uptime or reliability at the new pricing, and it is unclear whether resellers or cloud marketplaces will pass on the full discount. No confirmed pricing reactions from OpenAI, Anthropic, Google, Alibaba or ByteDance have been reported.
For developers weighing frontier model costs, the reset provides a new reference point that undercuts many alternatives, though its durability hinges on capacity and future adjustments. The move intensifies pricing pressure on model providers globally to justify their premium rates.
