AI RESEARCH
Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection
arXiv CS.CV
•
ArXi:2604.11234v1 Announce Type: new Text-guided multispectral object detection uses text semantics to guide semantic-aware cross-modal interaction between RGB and IR for robust perception. However, notable limitations remain: (1) existing methods often use text only as an auxiliary semantic enhancement signal, without exploiting its guiding role to bridge the inherent granularity asymmetry between RGB and IR; and (2) conventional data-driven attention-based fusion tends to emphasize stable consensus while overlooking potentially valuable cross-modal discrepancies.