AI RESEARCH
CRoPE: Efficient Parametrization of Rotary Positional Embedding
arXiv CS.LG
•
ArXi:2601.02728v2 Announce Type: replace Rotary positional embedding has become the state-of-the-art approach to encode position information in transformer-based models. While it is often succinctly expressed in complex linear algebra, we note that the actual implementation of $Q/K/V$-projections is not equivalent to a complex linear transformation. We argue that complex linear transformation is a natural parametrization and saves near 50\% parameters within the attention block. We show empirically that removing such redundancy has negligible impact on the model performance.