Memorize Theorems, Not Instances: Probing SFT Generalization through Mathematical Reasoning

ArXi:2605.09270v1 Announce Type: cross Supervised Fine-Tuning (SFT) is widely used for task-specific adaptation, yet recent work shows it systematically undermines reasoning generalization. We argue the root cause is not memorization itself, but its target: vanilla SFT drives models to exploit and memorize spurious surface correlations in problem-solution pairs, leaving them brittle to superficial input variations.