AI RESEARCH

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

arXiv CS.CL

ArXi:2605.12227v1 Announce Type: new Adapting large language models (LLMs) to long-context tasks requires post-