AI RESEARCH

LLM Benchmark Datasets Should Be Contamination-Resistant

arXiv CS.AI

ArXi:2605.19999v1 Announce Type: cross Benchmark datasets are critical for reproducible, reliable, and discriminative evaluation of LLMs. However, recent studies reveal that many benchmark datasets are included in pre