WebMall -- A Multi-Shop Benchmark for Evaluating Web Agents

ArXi:2508.13024v3 Announce Type: replace LLM-based web agents have the potential to automate long-running web tasks, such as searching for products in multiple e-shops and subsequently ordering the cheapest products that meet the users needs. Benchmarks for evaluating web agents either require agents to perform tasks online using the live Web or offline using simulated environments, the latter allowing for the exact reproduction of the experimental setup.