SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment

ArXi:2604.08988v1 Announce Type: new Current LLM-based agents nstrate strong performance in episodic task execution but remain constrained by static toolsets and episodic amnesia, failing to accumulate experience or optimize strategies across task boundaries. While the Self-Evolving Agent (SEA) paradigm has been previously proposed, this paper contributes a new formal definition of SEA grounded in digital embodiment and continuous cross-task evolution, and