AI RESEARCH
Beyond Completion: Probing Cumulative State Tracking to Predict LLM Agent Performance
arXiv CS.AI
•
ArXi:2603.27343v1 Announce Type: new Task-completion rate is the standard proxy for LLM agent capability, but models with identical completion scores can differ substantially in their ability to track intermediate state. We