Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning

ArXi:2605.07251v1 Announce Type: new Large Language Models (LLMs) have become increasingly capable as tool-using agents, with benchmarks spanning diverse general agentic tasks. Yet rigorous evaluation of scientific tool use remains limited. In chemistry, recent agents can plan syntheses and invoke domain-specific tools, but evaluations often rely on curated nstrations, expert assessment, or LLM-as-judge scoring rather than exact, judge-free ground truth.