Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

r/LocalLLaMA
Generative AI

AI model news: Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention. From r/LocalLLaMA.