Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors

ArXi:2510.08907v4 Announce Type: replace Context compression is an advanced technique that accelerates large language model (LLM) inference by converting long inputs into compact representations. Existing methods primarily rely on autoencoding tasks to train special compression tokens to represent contextual semantics. While autoencoding tasks enable compression tokens to acquire compression capabilities, we remark that such capabilities potentially conflict with actual downstream task requirements, prevent the models from learning the features beneficial for real-world usage.