AI RESEARCH
Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference
arXiv CS.AI
•
ArXi:2602.19509v2 Announce Type: replace-cross Large Language Models (LLMs) face a persistent trade-off between inference cost and reasoning capability. While "Oracle" models (e.g., Llama-3.3-70B) achieve state-of-the-art accuracy, they are prohibitively expensive for high-volume deployment. Smaller models (e.g., 7-9B parameters) are cost-effective but struggle with complex tasks.