AI RESEARCH
Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE
arXiv CS.LG
•
ArXi:2603.11611v1 Announce Type: new Rotary Positional Embedding (RoPE) is a common choice in transformer architectures for encoding relative positional information. Although earlier work has examined omitting RoPE in specific layers, the effect of varying the fraction of hidden dimensions that receive rotary transformations remains largely unexplored. This design choice can yield substantial memory savings, which becomes especially significant at long context lengths. We find up to 10x memory savings over the standard RoPE cache, while achieving comparable final loss.