CoLoRSMamba: Conditional LoRA-Steered Mamba for Supervised Multimodal Violence Detection

ArXi:2604.03329v1 Announce Type: cross Violence detection benefits from audio, but real-world soundscapes can be noisy or weakly related to the visible scene. We present CoLoRSMamba, a directional Video to Audio multimodal architecture that couples VideoMamba and AudioMamba through CLS-guided conditional LoRA.