LinearARD: Linear-Memory Attention Distillation for RoPE Restoration

ArXi:2604.00004v1 Announce Type: cross The extension of context windows in Large Language Models is typically facilitated by scaling positional encodings followed by lightweight Continual Pre-