AI RESEARCH
N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation
arXiv CS.AI
•
ArXi:2605.13190v1 Announce Type: cross Improving the inference efficiency of autoregressive transformers typically means reducing FLOPs per token, usually through approximations that degrade model quality. We