Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference

ArXi:2603.20224v1 Announce Type: new Large Language Models (LLMs) nstrate exceptional performance across diverse tasks but come with substantial energy and computational costs, particularly in request-heavy scenarios. In many real-world applications, the full scale and capabilities of LLMs are often unnecessary, as Small Language Models (SLMs) can provide accurate responses for simpler text generation tasks.