Anthropic unveils gated AI model that completed 11 of 20 simulated cyber attacks

The UK AISI found the model completed 11 of 20 simulated corporate-network attacks, a sharp improvement over Claude Opus 4.5, while Anthropic claims thousands of unconfirmed zero-days across major operating systems.

Monday, May 18, 2026 · min

Anthropic on April 7, 2026 announced Project Glasswing and Claude Mythos Preview, a gated research preview of a cyber‑focused AI model. On the same day, the UK AI Security Institute said the model had completed 11 of 20 simulated end‑to‑end corporate‑network attacks, up from 4 of 20 for the previous Claude Opus 4.5. The institute described the gain as a "step change" in autonomous cyber capability.

The preview, restricted to a set of vetted partners, illustrates how quickly frontier labs are productizing advanced coding and reasoning abilities into structured offensive and defensive tools. The dual‑use risk is explicit: the same technology can shrink patching cycles for defenders or enable sophisticated intrusions. Anthropic points to gated access, partner vetting and responsible disclosure as its primary safeguards, but the demonstration also sharpens the tension between public vulnerability disclosure and the risk of weaponization.

According to Anthropic’s own reporting, Mythos Preview found 17 previously unknown vulnerabilities in mature open‑source projects; 10 had been patched or publicly disclosed at the announcement. The model reproduced 332 known historical vulnerabilities and scored 83.1% on the CyberGym autonomous cyber benchmark, a metric defined by the company. Still, Anthropic stressed that human expertise and validation remained central to the process, even when the model operated autonomously in directed tests. The company also said it identified "thousands" of zero‑day vulnerabilities in major operating systems and browsers. However, severity validation relied on a manually reviewed subset of only 198 reports, with exact agreement by expert contractors in 89% of cases and agreement within one severity level in 98%. The vast majority of suspected vulnerabilities have not been disclosed, preventing independent review.

Independent replication remains sparse. The UK AISI evaluation is the only government‑backed external assessment released publicly. In the software community, Mozilla’s March 2026 security advisory for Firefox 148 credited Anthropic researchers using Claude for multiple vulnerabilities, and a May 2026 Mozilla Hacks post described using Mythos Preview alongside other models to find and fix latent Firefox bugs. OpenBSD published a security patch that also references vulnerability discovery linked to Anthropic‑related research. A FreeBSD zero‑day cited by Anthropic, CVE‑2026‑4747, is more nuanced in the official record. The National Vulnerability Database received the CVE from FreeBSD on March 26, 2026, before Anthropic’s April 7 launch, and it distinguishes kernel remote code execution by an authenticated user from unauthenticated client risk in userspace applications. The precise discovery timeline and the model’s role remain disputed.

Project Glasswing launch partners include AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA and Palo Alto Networks, with access extended to more than 40 additional organisations. Anthropic committed up to $100 million in model usage credits for participants and donated $2.5 million to the Alpha‑Omega/OpenSSF initiative via the Linux Foundation and $1.5 million to the Apache Software Foundation. After credits are exhausted, usage will be priced at $25 per million input tokens and $125 per million output tokens.

On April 21, 2026, Bloomberg reported that a small group of unauthorized users had accessed Mythos Preview via a third‑party contractor. Anthropic, according to the report, said it was investigating and had no indication of access beyond that contractor’s systems. No public forensic report has been released.

Government agencies responded cautiously. German BSI officials, per media accounts, stated they took the announcement seriously and expected significant changes in vulnerability management. The UK AISI’s evaluation is public, but other major bodies such as the US Cybersecurity and Infrastructure Security Agency have not released independent assessments.

The story still carries large unknowns. It is not clear how many of the alleged thousands of vulnerabilities have been independently validated by non‑Anthropic experts, what specific findings consortium partners have confirmed as attributable solely to Mythos Preview, or which access controls were bypassed in the unauthorized access incident. The severity distribution of the unpatched findings and how representative the 17 disclosed vulnerabilities are of the model’s broader behavior also remain open questions.

For chief information security officers and institutional investors, Mythos Preview signals that AI‑assisted vulnerability discovery is moving from laboratory benchmarks toward operational reality. The capabilities, however provisional, put pressure on disclosure norms and patch cadence, and will force security teams to weigh immediate defensive benefits against the risk of premature weaponization. Without broader independent testing and deeper consortium transparency, the full risk‑return calculus will take months—and more open evaluation—to come into focus.

— End —

Anthropic unveils gated AI model that completed 11 of 20 simulated cyber attacks

Related

Google released Gemini 3.5 Flash and made it the default for agentic workflows

Decart raises $300 million led by Radical Ventures with Nvidia as a backer

Anthropic acquired Stainless, shutting its SDK generator used by OpenAI and Cloudflare

OpenAI reportedly acquired Weights.gg in a quiet January deal