EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving

ArXi:2605.10556v1 Announce Type: cross As large language models span dense, mixture-of-experts, and state-space architectures and are deployed on heterogeneous accelerators under increasingly diverse multimodal workloads, optimising inference energy has become as critical as optimizing latency and throughput. Existing approaches either treat latency as an energy proxy or rely on data-hungry black-box surrogates.