ReXInTheWild: A Unified Benchmark for Medical Photograph Understanding

ArXi:2603.19517v1 Announce Type: cross Everyday photographs taken with ordinary cameras are already widely used in telemedicine and other online health conversations, yet no comprehensive benchmark evaluates whether vision-language models can interpret their medical content. Analyzing these images requires both fine-grained natural image understanding and domain-specific medical reasoning, a combination that challenges both general-purpose and specialized models. We