Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations

ArXi:2604.18146v1 Announce Type: cross Recently, large language models (LLMs) have advanced recommendation systems (RSs), and recent works have begun to explore how to integrate LLMs into industrial RSs. While most approaches deploy LLMs offline to generate and pre-cache augmented representations for RSs, high-dimensional representations from LLMs