CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation

ArXi:2601.19178v2 Announce Type: replace Sequential recommendation models are widely used in applications, yet they face stringent latency requirements. Mainstream models leverage the Transformer attention mechanism to improve performance, but its computational complexity grows with the sequence length, leading to a latency challenge for long sequences. Consequently, KV cache technology has recently been explored in sequential recommendation systems to reduce inference latency. However, KV cache