AI RESEARCH
TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis
arXiv CS.AI
•
ArXi:2603.24631v1 Announce Type: cross Code agents can autonomously resolve GitHub issues, yet when they fail, current evaluation provides no visibility into where or why. Metrics such as Pass collapse an entire execution into a single binary outcome, making it difficult to identify where and why the agent went wrong. To address this limitation, we