Tag
safety
2 dispatches
Claude Fable 5 Scores 95% on SWE-bench, Then Hands Off to Opus 4.8
Anthropic's new Mythos-class model leads on coding benchmarks but deliberately defers to a safer predecessor in restricted domains. That design choice says more than the score.
Single-Prompt Safety Scores Are Measuring the Wrong Thing
Cisco tested 15 frontier AI models under multi-turn attacks and found safety bypass rates up to 88%, exposing a structural flaw in how the industry benchmarks model safety.
