MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

ArXi:2603.00680v2 Announce Type: replace Long-horizon agents face the challenge of growing context size during interaction with environment, which degrades the performance and stability. Existing methods typically