When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning

ArXi:2604.08281v1 Announce Type: new Large reasoning models (LRMs) have achieved strong performance enhancement through scaling test time computation, but due to the inherent limitations of the underlying language models, they still have shortcomings in tasks that require precise computation and extensive knowledge reserves. Tool-Integrated Reasoning (TIR) has emerged as a promising paradigm that incorporates tool call and execution within the reasoning trajectory.