Unified Multimodal Visual Tracking with Dual Mixture-of-Experts

ArXi:2605.03716v1 Announce Type: new Multimodal visual object tracking can be divided into to several kinds of tasks (e.g. RGB and RGB+X tracking), based on the input modality. Existing methods often train separate models for each modality or rely on pretrained models to adapt to new modalities, which limits efficiency, scalability, and usability. Thus, we