AI RESEARCH

Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows

arXiv CS.AI

ArXi:2604.09611v1 Announce Type: cross Large language models (LLMs) are increasingly used in applications forming multi-request workflows like document summarization, search-based copilots, and multi-agent programming. While these workflows unlock richer functionality, they also amplify latency and energy demand during inference. Existing measurement and benchmarking efforts either focus on assessing LLM inference systems or consider single-request evaluations, overlooking workflow dependencies and cross-request interactions unique to multi-request workflows.