AI RESEARCH
Critique-Guided Distillation for Robust Reasoning via Refinement
arXiv CS.LG
•
ArXi:2505.11628v4 Announce Type: replace-cross Supervised fine-tuning with expert nstrations often produces models that imitate outputs without internalizing the reasoning processes needed for robust generalization. While critique-based approaches show promise