When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models

ArXi:2603.26556v1 Announce Type: cross Converting a pretrained Transformer into a efficient hybrid model through distillation offers a promising approach to reducing inference costs. However, achieving high-quality generation in distilled models requires careful joint design of both the student architecture and the distillation process. Many prior distillation works evaluate downstream multiple-choice benchmarks by ranking candidate answers with log-likelihood rather than requiring autoregressive generation, which can obscure important differences in model quality.