AI RESEARCH

Beyond Completion: Probing Cumulative State Tracking to Predict LLM Agent Performance

arXiv CS.AI

ArXi:2603.27343v1 Announce Type: new Task-completion rate is the standard proxy for LLM agent capability, but models with identical completion scores can differ substantially in their ability to track intermediate state. We