DeepSeek makes 75% V4-Pro API discount permanent after May 31

The new standardized rates drop to $0.003625 for cached input per million tokens, raising pressure on rival AI API pricing even as DeepSeek’s margins and chip-supply arrangements remain undisclosed.

Wednesday, May 27, 2026 · min

DeepSeek will permanently slash the price of accessing its flagship V4-Pro large language model by 75% after the current discount promotion expires in late May, converting what was a limited-time offer into the model’s official baseline rate. The Chinese startup’s API pricing page says that after the promotional window ends at 15:59 UTC on May 31, 2026, “the model API pricing will be officially adjusted to 1/4 of the original price.” The decision, first reported by Reuters and Bloomberg-syndicated outlets on May 23, immediately intensifies the global price war in AI inference.

The move entrenches DeepSeek’s position as the most aggressively priced frontier-model provider. While no established competitor has yet responded, the new rates—as low as $0.003625 per million tokens for cached input—put pressure on the developer-facing pricing models of OpenAI, Anthropic, Google, and the leading Chinese cloud players. For enterprises that are already benefiting from quickly falling token costs, a permanent cut removes uncertainty about when the bargain might end.

Under the revised schedule, DeepSeek V4-Pro will charge $0.003625 for cached input, $0.435 for cache-miss input, and $0.87 for output, all per million tokens. Before the promotion, the listed rates were $0.0145, $1.74, and $3.48. The Chinese-language page shows the same move in yuan: the price drops to 0.025, 3, and 6 yuan, from the original 0.1, 12, and 24 yuan. The model is the largest in the V4 family introduced in April, with 1.6 trillion total parameters, 49 billion active parameters, a 1-million-token context window, and a maximum output of 384,000 tokens.

DeepSeek released the V4 family, including the smaller V4-Flash, on April 24, 2026, and quickly followed with separate price cuts for input caching two days later. On April 27, Bloomberg reported the initial 75% V4-Pro discount as a promotional salvo in an intensifying Chinese AI price war that also involves Alibaba, Baidu, and ByteDance. The discount was originally listed with a May 5 expiration; the official page now shows a May 31 end date, indicating the promotion was extended before being made the new permanent baseline.

Reuters’ coverage of the permanent cut noted that DeepSeek did not disclose whether increased supply of Huawei Ascend 950 chips enabled the move. The startup’s inference costs and profit margins are not public, leaving open the question of whether the new prices are loss-leading or genuinely sustainable. The pricing page also carries the fine print that DeepSeek “reserves the right to adjust the prices,” meaning the 75% reduction is the current official tariff, not a contractual guarantee. No other major API provider has announced a matching price cut, and none has publicly commented on the competitive pressure.

It is not yet clear whether the new rates apply to all API users or are subject to quotas, geography-based limitations, or minimum-commitment terms. The pricing documentation does not address those details. The exact timing of when the page was updated with the permanent-adjustment language is also unknown; no separate blog post or press release from DeepSeek has surfaced beyond the live pricing page.

The permanent cut marks another step in the rapid commoditisation of frontier AI inference. For the developers and enterprises that buy tokens, the message is unambiguous—unit costs keep falling—but the business model of the companies providing those tokens remains largely unverifiable. As the discount becomes the norm on June 1, the spotlight will shift to whether rival labs can afford to stay on the same pricing path.

— End —

DeepSeek makes 75% V4-Pro API discount permanent after May 31

Related

NVIDIA releases Cosmos 3 physical AI models, with Edge variant still to come

Anthropic released Claude Opus 4.8, introducing effort controls and a faster preview mode

OpenAI closes Windows gap with Codex remote control and computer use

Alibaba released Qwen3.7-Max, a proprietary agent model