Measuring Intent Comprehension in LLMs

ArXi:2506.16584v2 Announce Type: replace-cross People judge interactions with large language models (LLMs) as successful when outputs match what they want, not what they type. Yet LLMs are trained to predict the next token solely from text input, not underlying intent. Because written language is an imperfect proxy for intent, and correlations between phrasing and desired outcomes can break down in