AI RESEARCH

Short Data, Long Context: Distilling Positional Knowledge in Transformers

arXiv CS.LG

ArXi:2604.06070v1 Announce Type: cross Extending the context window of language models typically requires expensive long-context pre-