RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs

ArXi:2605.02946v1 Announce Type: new Safety alignment is critical for the responsible deployment of large language models (LLMs). As Mixture-of-Experts (MoE) architectures are increasingly adopted to scale model capacity, understanding their safety robustness becomes essential. Existing adversarial attacks, however, have notable limitations.