AI RESEARCH

Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection

arXiv CS.CV

ArXi:2604.11234v1 Announce Type: new Text-guided multispectral object detection uses text semantics to guide semantic-aware cross-modal interaction between RGB and IR for robust perception. However, notable limitations remain: (1) existing methods often use text only as an auxiliary semantic enhancement signal, without exploiting its guiding role to bridge the inherent granularity asymmetry between RGB and IR; and (2) conventional data-driven attention-based fusion tends to emphasize stable consensus while overlooking potentially valuable cross-modal discrepancies.