LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

ArXi:2605.11301v1 Announce Type: new Multimodal large language models (MLLMs) have heterogeneous strengths across OCR, chart understanding, spatial reasoning, visual question answering, cost, and latency. Effective MLLM routing. therefore. requires than estimating query difficulty: a router must match the multimodal requirements of the current image-question input with the capabilities of each candidate model. We propose LatentRouter, a router that formulates MLLM routing as counterfactual multimodal utility prediction.