MARVIS: Modality Adaptive Reasoning over VISualizations

ArXi:2507.01544v2 Announce Type: replace Predictive applications of machine learning often rely on small (sub 1 Bn parameter) specialized models tuned to particular domains or modalities. Such models often achieve excellent performance, but lack flexibility. LLMs and VLMs offer versatility, but typically underperform specialized predictors, especially on non-traditional modalities and long-tail domains.