LLM Benchmark Datasets Should Be Contamination-Resistant

ArXi:2605.19999v1 Announce Type: cross Benchmark datasets are critical for reproducible, reliable, and discriminative evaluation of LLMs. However, recent studies reveal that many benchmark datasets are included in pre