AI RESEARCH
Short Data, Long Context: Distilling Positional Knowledge in Transformers
arXiv CS.LG
•
ArXi:2604.06070v1 Announce Type: cross Extending the context window of language models typically requires expensive long-context pre-