AI RESEARCH
One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries
arXiv CS.AI
•
ArXi:2603.11545v1 Announce Type: cross We present an agentic AI framework for autonomous multimodal query processing that coordinates specialized tools across text, image, audio, video, and document modalities. A central Supervisor dynamically decomposes user queries, delegates subtasks to modality-appropriate tools (e.g., object detection, OCR, speech transcription), and synthesizes results through adaptive routing strategies rather than predetermined decision trees.