AI RESEARCH

RouteNLP: Closed-Loop LLM Routing with Conformal Cascading and Distillation Co-Optimization

arXiv CS.LG

ArXi:2604.23577v1 Announce Type: cross Serving diverse NLP workloads with large language models is costly: at one enterprise partner, inference costs exceeded $200K/month despite over 70% of queries being routine tasks well within the capability of smaller models. We present RouteNLP, a closed-loop framework that routes queries across a tiered model portfolio to minimize cost while satisfying per-task quality constraints.