AI RESEARCH

Measuring all the noises of LLM Evals

arXiv CS.AI

ArXi:2512.21326v2 Announce Type: replace-cross Separating signal from noise is central to experiments. Applying well-established statistical methods effectively to LLM evals requires consideration of their unique noise characteristics. We clearly define and measure three types of noise: prediction noise from generating different answers on a given question, data noise from sampling questions, and their combined total noise following the law of total variance.