This AI Detection Benchmark Is Almost Too Good. That’s the Problem.
The PES Benchmark v0.2 achieves a staggering Cohen’s d of 10.4 — near-perfect separation between real and AI-generated motion. But this extreme performance is a warning, not a victory. As detection improves, AI generators learn to fix their tells. The arms race is real, and this benchmark may be the last snapshot of a winning detection strategy.