AI RESEARCH
WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks
arXiv CS.LG
•
ArXi:2604.06367v1 Announce Type: cross Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully execute user-facing website security and privacy tasks, such as managing cookie preferences, configuring privacy-sensitive account settings, or revoking inactive sessions. To address this gap, we.