AI RESEARCH

Brain-Inspired Multimodal Spiking Neural Network for Image-Text Retrieval

arXiv CS.CV

ArXi:2603.26787v1 Announce Type: new Spiking neural networks (SNNs) have recently shown strong potential in unimodal visual and textual tasks, yet building a directly trained, low-energy, and high-performance SNN for multimodal applications such as image-text retrieval (ITR) remains highly challenging. Existing artificial neural network (ANN)-based methods often pursue richer unimodal semantics using deeper and complex architectures, while overlooking cross-modal interaction, retrieval latency, and energy efficiency.