AI RESEARCH
CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation
arXiv CS.AI
•
ArXi:2601.19178v2 Announce Type: replace Sequential recommendation models are widely used in applications, yet they face stringent latency requirements. Mainstream models leverage the Transformer attention mechanism to improve performance, but its computational complexity grows with the sequence length, leading to a latency challenge for long sequences. Consequently, KV cache technology has recently been explored in sequential recommendation systems to reduce inference latency. However, KV cache