Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

ArXi:2605.01402v1 Announce Type: cross Multimodal large language models (MLLMs) struggle with numerical regression under long-tailed target distributions. Token-level supervised fine-tuning (SFT) and point-wise regression rewards bias learning toward high-density regions, leading to regression-to-the-mean behavior and poor tail performance. We identify the lack of cross-sample relational supervision as a key limitation of existing MLLM