AI RESEARCH
QCFuse: Query-Centric Cache Fusion for Efficient RAG Inference
arXiv CS.AI
•
ArXi:2604.08585v1 Announce Type: cross Cache fusion accelerates generation process of LLMs equipped with RAG through KV caching and selective token recomputation, thereby reducing computational costs and improving efficiency. However, existing methods primarily rely on local perspectives for token selection and lack global awareness from the user query. Utilizing this global awareness is challenging due to the high cost of obtaining context-aware query representations and the strict pipeline constraints required for efficient attention analysis. Thus, this nstration