AI RESEARCH
Tri-Modal Fusion Transformers for UAV-based Object Detection
arXiv CS.CV
•
ArXi:2604.16630v1 Announce Type: new Reliable UAV object detection requires robustness to illumination changes, motion blur, and scene dynamics that suppress RGB cues. Thermal long-wave infrared (LWIR) sensing preserves contrast in low light, and event cameras retain microsecond-level temporal edges, but integrating all three modalities in a unified detector has not been systematically studied. We present a tri-modal framework that processes RGB, thermal, and event data with a dual-stream hierarchical vision transformer.