AI RESEARCH

Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

arXiv CS.AI

ArXi:2604.04037v1 Announce Type: cross Knowledge distillation compresses large teachers into smaller students, but performance saturates at a loss floor that persists across