AI RESEARCH

Active Testing of Large Language Models via Approximate Neyman Allocation

arXiv CS.AI

ArXi:2605.10075v1 Announce Type: new Large language models (LLMs) require reliable evaluation from pre-