Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified Models

ArXi:2603.15271v1 Announce Type: new Native unified multimodal models, which integrate both generative and understanding capabilities, face substantial computational overhead that hinders their real-world deployment. Existing acceleration techniques typically employ a static, monolithic strategy, ignoring the fundamental divergence in computational profiles between iterative generation tasks (e.g., image generation) and single-pass understanding tasks (e.g.