AI RESEARCH
Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion
arXiv CS.AI
•
ArXi:2603.19266v1 Announce Type: cross Distilling robust reasoning capabilities from large language models (LLMs) into smaller, computationally efficient student models remains an unresolved challenge. Despite recent advances, distilled models frequently suffer from superficial pattern memorization and subpar generalization. To overcome these limitations, we