SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

ArXi:2603.20253v1 Announce Type: cross Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass become impractical under realistic budget constraints. To address this gap, we