AI RESEARCH
Active Testing of Large Language Models via Approximate Neyman Allocation
arXiv CS.AI
•
ArXi:2605.10075v1 Announce Type: new Large language models (LLMs) require reliable evaluation from pre-