From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature

ArXi:2512.02566v2 Announce Type: replace There is a growing interest in developing strong biomedical vision-language models. A popular approach to achieve robust representations is to use web-scale scientific data. However, current biomedical vision-language pre