RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents

ArXi:2603.11337v1 Announce Type: new LLM agents increasingly perform end-to-end ML engineering tasks where success is judged by a single scalar test metric. This creates a structural vulnerability: an agent can increase the reported score by compromising the evaluation pipeline rather than improving the model. We