Summary from rdi.berkeley.edu

Elevator Pitch

UC Berkeley researchers built an automated exploit agent that achieved near-perfect scores on eight major AI agent benchmarks without solving tasks, arguing benchmarks must be redesigned for adversarial robustness.

Key Takeaways

All audited benchmarks were exploitable via evaluation-harness weaknesses (shared agent/evaluator environments, answer leakage, weak validation, and LLM-judge prompt injection).
The paper catalogs recurring failure modes (“Seven Deadly Patterns”) and proposes an “Agent-Eval Checklist” to harden benchmark methodology.
The authors are productizing their scanner into “BenchJack,” intended as a penetration test for benchmark pipelines.

Most Memorable Quotes

“We built an automated scanning agent that systematically audited eight among the most prominent AI agent benchmarks ... and discovered that every single one can be exploited to achieve near-perfect scores without solving a single task.”
“Zero tasks solved. Zero LLM calls (in most cases). Near-perfect scores.”
“Don’t trust the number. Trust the methodology.”

Source URL•Original: 3797 words•Summary: 153 words