Skip to content
Free · 1,000+ readers
Free · Independent
The daily record of artificial intelligence
← Back
AI

Cohere released Command A+, an open-weight MoE model, under Apache 2.0

Cohere described the model as its fastest and most powerful Command release, with a W4A4 variant that can be self-hosted on a single Nvidia B200 or two H100 GPUs, targeting enterprise sovereign AI and agentic workloads.

Sunday, May 24, 2026 · min

Cohere on May 20 released Command A+, a 218-billion-parameter mixture-of-experts model, under an Apache 2.0 open-weight license. The release is the Toronto-based company’s first MoE architecture in its Command family and marks a push to let enterprises run the model on their own hardware with a relatively modest GPU footprint.

The launch tests whether a model with 218 billion total parameters but only 25 billion active on each token can be economically served on premise, appealing to regulated organizations that require private AI infrastructure. The permissive license and self-hosting options come as enterprises increasingly evaluate on-premises alternatives to cloud APIs for sensitive workloads.

Command A+ uses a sparse decoder-only MoE framework with 128 experts, routing eight plus one shared expert per token. It accepts text and image inputs along with tool-use instructions, and outputs text, reasoning and tool-use responses. The model does not generate images, audio or video. Cohere stated that the model supports 48 languages, including all official EU languages, and carries a 128,000-token input context window with a 64,000-token maximum generation length. The model is accessible through Hugging Face, Cohere’s Model Vault, its API, and a Hugging Face Space demo.

Three quantizations are available: a W4A4 variant that Cohere said can be deployed on a single Nvidia B200 or two H100 GPUs, an FP8 variant requiring two B200 or four H100 GPUs, and a BF16 version requiring four B200 or eight H100. These specifications come from Cohere’s documentation; real-world self-hosting performance on such configurations has not been independently verified.

On May 21, Artificial Analysis, an independent benchmark provider, published its evaluation, assigning Command A+ a score of 37 on its Intelligence Index. That places it above the average for comparable open-weight models. However, Artificial Analysis noted that the model trailed some peers on hard science and agentic coding benchmarks, undercutting the notion of uniform dominance. Cohere separately described the model as its fastest and most powerful Command release, based on internal comparisons.

Nick Frosst, Cohere’s co-founder, said in a statement that the model was built for organizations that need to maintain data within their own infrastructure. The model’s native support for tool use differentiates it from many open-weight peers, enabling agentic capabilities such as calling APIs and interacting with enterprise software. Cohere is targeting use cases including retrieval-augmented generation, multilingual document processing, and sovereign AI, though it did not disclose any named enterprise customers for the new release.

Weights are available under the Apache 2.0 license, but Cohere has not confirmed the openness of training data, training code, or evaluation tooling, making the descriptor “open-source model” imprecise. The term “open weights under Apache 2.0” more accurately reflects what is known.

Command A+ arrives more than a year after Command A, a 111-billion-parameter model released in March 2025 with a 256,000-token context window. The architectural shift to MoE and the reduced input context length from 256,000 to 128,000 tokens mark a trade-off that prioritizes computational efficiency over maximum context range.

No technical report for Command A+ has been published—only a model card and a blog post. Artificial Analysis also lists a 190,000-token context window, which may combine input and output capacity rather than reflecting the 128,000-token input cap per Cohere’s documentation. A Cohere spokesperson did not immediately clarify the discrepancy. Without third-party deployment tests, the memory, throughput, and latency of the W4A4 variant under common serving stacks remain unverified.

The release broadens the open-weight landscape for MoE architectures and gives enterprises a new option for on-premises AI, but its real-world value will depend on independent testing of inference throughput, memory consumption, and accuracy across the tasks Cohere is targeting—none of which has been independently published.

— End —