Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference

ArXi:2603.13426v1 Announce Type: cross Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed -- an offline process that adds no parameters, latency, or GPU cost at serving time. On MetaTool (199~tools, 4,287~queries), this improves NDCG from 0.869 to 0.940; on ToolBench (2,413~APIs), from 0.834 to 0.848.