Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation

ArXi:2604.02324v1 Announce Type: cross Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their representations.