AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

ArXi:2603.21357v1 Announce Type: new LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass on ToolBench (Zhou, 2024; Qin, 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We