AI RESEARCH

Active Testing of Large Language Models via Approximate Neyman Allocation

arXiv CS.AI • May 12, 2026

ArXi:2605.10075v1 Announce Type: new Large language models (LLMs) require reliable evaluation from pre-