AI RESEARCH
Differential Transformer V2
Hugging Face Blog
•
Abstract Code Motivation Faster Decoding & No Custom Kernels Softmax Magnitude Constraint Beyond Softmax Constraint & Elimination of Attention Sinks Experimental Observations Discussions Construction of Differential Operation Design Ablations Miscellaneous Tianzhu Ye, Li Dong, Yutao Sun, Furu Wei