Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion

ArXi:2603.19266v1 Announce Type: cross Distilling robust reasoning capabilities from large language models (LLMs) into smaller, computationally efficient student models remains an unresolved challenge. Despite recent advances, distilled models frequently suffer from superficial pattern memorization and subpar generalization. To overcome these limitations, we