AI RESEARCH
CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging
arXiv CS.CV
•
ArXi:2604.22989v1 Announce Type: new Recent medical multimodal foundation models are built as multimodal LLMs (MLLMs) by connecting a CLIP-pretrained vision encoder to an LLM using LLaVA-style finetuning. This two-stage, decoupled approach