AI RESEARCH
Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory
arXiv CS.AI
•
ArXi:2604.04037v1 Announce Type: cross Knowledge distillation compresses large teachers into smaller students, but performance saturates at a loss floor that persists across