When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

ArXi:2604.00892v1 Announce Type: new As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution is becoming a core requirement for realistic deployment. However, existing benchmarks largely assume uninterrupted agent behavior or study interruptions only in short, unconstrained language tasks.