In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs

ArXi:2512.02543v2 Announce Type: replace Deploying LLM agents at scale typically requires choosing between quality and cost. Existing cost-reduction approaches fail to preserve agility: the ability to iterate rapidly without human time bottlenecks. Prompt engineering is brittle and slows iteration, while fine-tuning requires multi-day