Critique-Guided Distillation for Robust Reasoning via Refinement

ArXi:2505.11628v4 Announce Type: replace-cross Supervised fine-tuning with expert nstrations often produces models that imitate outputs without internalizing the reasoning processes needed for robust generalization. While critique-based approaches show promise