Elevator Pitch
- UC Berkeley researchers built an automated exploit agent that achieved near-perfect scores on eight major AI agent benchmarks without solving tasks, arguing benchmarks must be redesigned for adversarial robustness.
Key Takeaways
- All audited benchmarks were exploitable via evaluation-harness weaknesses (shared agent/evaluator environments, answer leakage, weak validation, and LLM-judge prompt injection).
- The paper catalogs recurring failure modes (“Seven Deadly Patterns”) and proposes an “Agent-Eval Checklist” to harden benchmark methodology.
- The authors are productizing their scanner into “BenchJack,” intended as a penetration test for benchmark pipelines.
Most Memorable Quotes
- “We built an automated scanning agent that systematically audited eight among the most prominent AI agent benchmarks ... and discovered that every single one can be exploited to achieve near-perfect scores without solving a single task.”
- “Zero tasks solved. Zero LLM calls (in most cases). Near-perfect scores.”
- “Don’t trust the number. Trust the methodology.”
Source URL•Original: 3797 words
•Summary: 153 words