Benchmark Leakage Trap: Can We Trust LLM-based Recommendation?

ArXi:2602.13626v2 Announce Type: replace The expanding integration of Large Language Models (LLMs) into recommender systems poses critical challenges to evaluation reliability. This paper identifies and investigates a previously overlooked issue: benchmark data leakage in LLM-based recommendation. This phenomenon occurs when LLMs are exposed to and potentially memorize benchmark datasets during pre-