AI RESEARCH

VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models

arXiv CS.LG

ArXi:2508.15229v3 Announce Type: replace-cross Small Language Models (SLMs) provide computational advantages in resource-constrained environments, yet memory limitations remain a critical bottleneck for edge device deployment. A substantial portion of SLMs' memory footprint stems from vocabulary-related components, particularly embeddings and language modeling (LM) heads, due to large vocabulary sizes. Existing static vocabulary pruning, while reducing memory usage, suffers from rigid, one-size-fits-all designs that cause information loss during the prefill stage and lack flexibility.