AI RESEARCH

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management

arXiv CS.AI

ArXi:2604.13531v1 Announce Type: new Graphical User Interface (GUI) agents show strong capabilities for automating web tasks, but existing interactive benchmarks primarily target benign, predictable consumer environments. Their effectiveness in high-stakes, investigative domains such as authentic e-commerce risk management remains underexplored. To bridge this gap, we present RiskWebWorld, the first highly realistic interactive benchmark for evaluating GUI agents in e-commerce risk management.