PoC: Performance-oriented Context Compression for Large Language Models via Performance Prediction

ArXi:2603.19733v1 Announce Type: new While context compression can mitigate the growing inference costs of Large Language Models (LLMs) by shortening contexts, existing methods that specify a target compression ratio or length suffer from unpredictable performance degradation, hindering their reliable deployment. We