AI RESEARCH

Tri-Modal Fusion Transformers for UAV-based Object Detection

arXiv CS.CV

ArXi:2604.16630v1 Announce Type: new Reliable UAV object detection requires robustness to illumination changes, motion blur, and scene dynamics that suppress RGB cues. Thermal long-wave infrared (LWIR) sensing preserves contrast in low light, and event cameras retain microsecond-level temporal edges, but integrating all three modalities in a unified detector has not been systematically studied. We present a tri-modal framework that processes RGB, thermal, and event data with a dual-stream hierarchical vision transformer.