Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents

ArXi:2510.04491v2 Announce Type: replace Despite rapid progress in building conversational AI agents, robustness is still largely untested. Small shifts in user behavior, such as being impatient, incoherent, or skeptical, can cause sharp drops in agent performance, revealing how brittle current AI agents are. Today's benchmarks fail to capture this fragility: agents may perform well under standard evaluations but degrade spectacularly in realistic and varied settings. We address this robustness testing gap by.