UVLM: A Universal Vision-Language Model Loader for Reproducible Multimodal Benchmarking

ArXi:2603.13893v1 Announce Type: cross Vision-Language Models (VLMs) have emerged as powerful tools for image understanding tasks, yet their practical deployment remains hindered by significant architectural heterogeneity across model families. This paper