AI RESEARCH
Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models
arXiv CS.CL
•
ArXi:2605.12227v1 Announce Type: new Adapting large language models (LLMs) to long-context tasks requires post-