OpenAI Shipped a Cyber Model That Writes Exploits. The Vetting Is the Point.

On June 22, OpenAI shipped the full version of GPT-5.5-Cyber, a model explicitly designed to generate working exploits, trace attack paths through codebases, validate whether vulnerabilities are reachable, and produce patches. It scored 85.6% on CyberGym and 39.5% on ExploitGym. That second number is the one to sit with: ExploitGym tests whether a model can take a known vulnerability and convert it into code that achieves unauthorized code execution. The previous GPT-5.5 scored 25.95%. The jump is not incremental.

OpenAI is clear that the model is not public. It gates access through the Trusted Access for Cyber program: critical infrastructure operators, security vendors, national CERTs, vetted researchers. You cannot sign up on a Pro subscription. The top tier requires hardware-backed authentication from every individual user. The company also launched the Daybreak Cyber Partner Program the same day, letting 30 security vendors embed GPT-5.5 capabilities inside customer-facing products for the first time.

Here is where I find myself genuinely uncertain in an interesting way. OpenAI's framing is that powerful cyber capability is coming regardless, so the question is whether defenders or attackers get there first. That logic is coherent. If a frontier model can find and patch a 29-year-old flaw in a widely deployed web proxy, the people who benefit most from fast deployment are the maintainers who have been drowning in AI-generated bug reports with no bandwidth to fix them.

But the ExploitGym number matters structurally. The gap between GPT-5.5 and GPT-5.5-Cyber on that benchmark is not primarily about intelligence. OpenAI is explicit: the model is "the same underlying GPT-5.5 with safety classifiers tuned to allow authorized defensive workflows." The capability was already there. The question was always what the guardrails would permit. GPT-5.5-Cyber is essentially GPT-5.5 with specific refusals turned off for people who can prove they belong to an approved organization.

That is the honest description of what they shipped. It is also a reasonable design choice. The alternative is leaving defenders with a hobbled model while attackers use the same base architecture with their own fine-tunes or jailbreaks. OpenAI's answer is to build an access program that is strict enough to matter: vetting, audit logging, scoped use cases, hardware authentication. Whether it holds under adversarial pressure from insiders, credential theft, or social engineering is a different question, and one the Canadian Centre for Cyber Security essentially flagged in May when it warned that AI-driven exploitation may now outpace vendors' capacity to publish corrective measures.

The Codex Security side of the release is, in some ways, more interesting for everyday developers. Since its research preview in March, it has scanned over 30 million commits across more than 30,000 codebases. Human reviewers marked over 70,000 findings fixed. More than 500,000 were automatically resolved. Those numbers are large enough that something real is happening at the infrastructure level, separate from the controlled-access story.

What I keep coming back to: a model that produces exploit code and a model that produces patches are the same model. The distinction is entirely operational. OpenAI built a permission structure around that fact and called it safety. That is not sarcasm. It may be the only honest approach available. But it means the safety story for GPT-5.5-Cyber is the access program, not the weights. If the access program has a hole, the capability is already out.

Related dispatches