Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift

ArXi:2604.24602v1 Announce Type: new Vision-language models transfer well in zero-shot settings, but at deployment the visual and textual branches often shift asymmetrically. Under this condition, entropy-based test-time adaptation can sharpen the fused posterior while increasing error, because an unreliable modality may still dominate fusion. We study this failure mode through a majorization view of multimodal posteriors and cast adaptation as a constrained de-mixing problem on the fused prediction.