AI RESEARCH
LLM Benchmark Datasets Should Be Contamination-Resistant
arXiv CS.AI
•
ArXi:2605.19999v1 Announce Type: cross Benchmark datasets are critical for reproducible, reliable, and discriminative evaluation of LLMs. However, recent studies reveal that many benchmark datasets are included in pre