Skip to content
Free · 1,000+ readers
Free · Independent
The daily record of artificial intelligence
← Back
AI

DeepSeek makes 75% V4-Pro API price cut permanent

The Chinese AI lab said its highest-capacity model’s cloud inference pricing will stay at one-quarter of launch levels after May 31, intensifying competition among LLM providers.

Tuesday, May 26, 2026 · min

DeepSeek said on Saturday that the 75% promotional discount on its V4-Pro API—originally due to expire at the end of May—will become the model’s permanent list price, locking in a cut that slashes the cost of accessing one of the most capable large language models to a fraction of its launch level.

The move, first reported by Reuters and confirmed on DeepSeek’s own pricing pages, marks the latest escalation in a price war that has gripped China’s AI sector since 2024 and that increasingly shapes the economics of enterprise adoption worldwide. For developers using the 1.6-trillion-parameter model for long-context tasks, the new permanent rates mean the price advantage of DeepSeek’s promotional period will not evaporate.

According to the updated documentation, the lower prices—effective after May 31 at 15:59 UTC—will be $0.003625 per million tokens for cache-hit input, $0.435 for cache-miss input, and $0.87 for output. The struck-through original list prices were $0.0145, $1.74, and $3.48. In yuan terms, the figures dropped from 0.1, 12, and 24 to 0.025, 3, and 6. The smaller V4-Flash model, with 284 billion parameters, was not affected by the change.

DeepSeek introduced the V4 series in preview on April 24, positioning V4-Pro as the higher-performance model in the family. It carries 1.6 trillion total parameters—49 billion active—and supports a 1-million-token context window, characteristics suited to agentic coding, large-document analysis, and other extended inference workloads. Its concurrency limit of 500 reflects heavier computing demands compared with 2,500 for the V4-Flash variant.

The lab has a history of aggressive pricing. In February 2025 it offered developers off-peak discounts of up to 75% on its V3 and R1 models, and a broader Chinese LLM price war erupted in May 2024 when Tencent, iFlytek, ByteDance, and others cut fees. The temporary 75% cut on V4-Pro, applied April 26, was initially expected to end—some third-party trackers had pegged the expiration to early May—but DeepSeek’s latest update extinguishes that uncertainty by formalizing the discount as the baseline.

Why DeepSeek can sustain the cut permanently remains unexplained. Reuters reported that the company did not link the decision to increased availability of Huawei Ascend 950 chips, even though Tom’s Hardware had earlier suggested that pricing would fall significantly once Ascend 950 supernodes entered volume production. Gains from inference optimization, lower compute costs, or a strategic push for market share are all plausible, but no official rationale has been provided.

The decision places immediate pressure on competitors such as OpenAI, Anthropic, and Google, whose comparable models can cost multiples more. It also accelerates the pricing calculus for enterprises that use API gateways and resellers, though actual savings will hinge on cache-hit ratios and output consumption. Absent a counter-move from rivals, the permanent reduction deepens the industry’s shift toward ever-cheaper inference, testing the margin assumptions that underpin the LLM-as-a-service market.

— End —