AI RESEARCH

Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target

arXiv CS.AI

ArXi:2605.18899v1 Announce Type: cross Generative LLM-based recommenders (LLM-Rec) require continual post-deployment updates, yet deployment logs provide only policy-shaped contextual bandit feedback: outcomes are observed solely for items exposed by a prior serving policy, inducing exposure bias and yielding partial, asymmetric signals consisting of relatively reliable positive responses and ambiguous no-responses.