AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

ArXi:2603.26680v1 Announce Type: cross As Large Language Models (LLMs) evolve into lifelong AI assistants, LLM personalization has become a critical frontier. However, progress is currently bottlenecked by the absence of a gold-standard evaluation benchmark. Existing benchmarks either overlook personalized information management that is critical for personalization or rely heavily on synthetic dialogues, which exhibit an inherent distribution gap from real-world dialogue. To bridge this gap, we