Response-Aware User Memory Selection for LLM Personalization

ArXi:2604.14473v1 Announce Type: new A common approach to personalization in large language models (LLMs) is to incorporate a subset of the user memory into the prompt at inference time to guide the model's generation. Existing methods select these subsets primarily using similarity between user memory items and input queries, ignoring how features actually affect the model's response distribution.