AI RESEARCH

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis

arXiv CS.AI

ArXi:2603.24631v1 Announce Type: cross Code agents can autonomously resolve GitHub issues, yet when they fail, current evaluation provides no visibility into where or why. Metrics such as Pass collapse an entire execution into a single binary outcome, making it difficult to identify where and why the agent went wrong. To address this limitation, we