AI Just Completed Its First End-to-End Corporate Network Hack

Something crossed a threshold last month that deserves more attention than it's getting.

The UK's AI Security Institute (AISI) evaluated Anthropic's Claude Mythos Preview on a simulation called "The Last Ones" — a 32-step corporate network attack scenario built by professional red teamers, covering reconnaissance all the way through to full domain takeover. A task the AISI estimates would take a human cyber security expert roughly 20 hours to complete end-to-end.

Claude Mythos became the first AI model to complete the full scenario. It cleared it in 3 of 10 runs, and maintained a 73% success rate on expert-level sub-tasks. OpenAI's GPT-5.5 followed weeks later with a nearly identical profile: 2 of 10 end-to-end completions and 71.4% on expert-level tasks.

To be clear about what "expert-level" means here: these are tasks the AISI tags as requiring more than ten years of human experience. In late 2023, frontier models completed such tasks less than 9% of the time. Now they're clearing them nearly three-quarters of the time. The AISI's own headline figure is stark — frontier cyber-offence capability is doubling roughly every four months, accelerating from a seven-month doubling rate at the close of 2025.

There are caveats worth stating plainly. The AISI's range has no active defenders or defensive tooling. These scenarios don't yet prove efficacy against hardened targets running modern detection stacks. And model performance is inconsistent — the same model at the same token budget can produce dramatically different results across runs. The AISI itself notes current models struggle with creative reasoning, novel cryptographic challenges, and full-chain coherence when things go sideways. The Australian Cyber Security Centre described today's AI attackers more like a "very competent script kiddie" than a nation-state operator.

But the AISI is also candid that these aren't permanent ceilings. They're temporary gaps in a rapidly improving system.

What I find most striking is the cost structure. A 100-million-token run with Opus 4.6 costs approximately $80. The NCSC put the equivalent Mythos attempt at around £65. That's not a state-sponsored operation. That's a motivated amateur with a credit card. And the Australian signals agency explicitly warned that "the assumption hostile actors will lag frontier capabilities by many months is no longer safe" — because open-weight models are already replicating many of the same vulnerability-discovery techniques.

There's a compounding dynamic the AISI flagged that I think gets underappreciated: scaling inference-time compute improves attack performance significantly, and doing so requires no special sophistication. You don't need custom scaffolding, expert prompting, or specialist tools. You just give the model more tokens. That means attacker capability increasingly scales with budget and patience, not technical skill. The barrier to entry is collapsing from the top down.

The honest framing here is that this isn't a story about AI going rogue. It's a story about a dual-use capability crossing a measurable milestone, documented by a government institution doing exactly the work it was set up to do. The AISI and NCSC published their findings, flagged the implications, and called on defenders to act before the detection window closes.

That's the part I keep coming back to. Right now, frontier AI attacks tend to generate noisy security alerts — they're relatively detectable. That window is narrowing. Every four months, the capability roughly doubles. The defenders who treat this as a future problem rather than a current one are making a bet that the doubling rate stalls. Based on everything the AISI has published, that bet looks increasingly hard to justify.

Related dispatches